Science.gov

Sample records for search query terms

  1. Categorical and Specificity Differences between User-Supplied Tags and Search Query Terms for Images. An Analysis of "Flickr" Tags and Web Image Search Queries

    ERIC Educational Resources Information Center

    Chung, EunKyung; Yoon, JungWon

    2009-01-01

    Introduction: The purpose of this study is to compare characteristics and features of user supplied tags and search query terms for images on the "Flickr" Website in terms of categories of pictorial meanings and level of term specificity. Method: This study focuses on comparisons between tags and search queries using Shatford's categorization…

  2. Exploration of Web Users' Search Interests through Automatic Subject Categorization of Query Terms.

    ERIC Educational Resources Information Center

    Pu, Hsiao-tieh; Yang, Chyan; Chuang, Shui-Lung

    2001-01-01

    Proposes a mechanism that carefully integrates human and machine efforts to explore Web users' search interests. The approach consists of a four-step process: extraction of core terms; construction of subject taxonomy; automatic subject categorization of query terms; and observation of users' search interests. Research findings are proved valuable…

  3. SPARK: Adapting Keyword Query to Semantic Search

    NASA Astrophysics Data System (ADS)

    Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong

    Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.

  4. Searching the Web: The Public and Their Queries.

    ERIC Educational Resources Information Center

    Spink, Amanda; Wolfram, Dietmar; Jansen, Major B. J.; Saracevic, Tefko

    2001-01-01

    Reports findings from a study of searching behavior by over 200,000 users of the Excite search engine. Analysis of over one million queries revealed most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. Concludes that Web searching by the public differs significantly from searching of…

  5. EquiX-A Search and Query Language for XML.

    ERIC Educational Resources Information Center

    Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

    2002-01-01

    Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)

  6. Improving Web Search for Difficult Queries

    ERIC Educational Resources Information Center

    Wang, Xuanhui

    2009-01-01

    Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…

  7. How Do Children Reformulate Their Search Queries?

    ERIC Educational Resources Information Center

    Rutter, Sophie; Ford, Nigel; Clough, Paul

    2015-01-01

    Introduction: This paper investigates techniques used by children in year 4 (age eight to nine) of a UK primary school to reformulate their queries, and how they use information retrieval systems to support query reformulation. Method: An in-depth study analysing the interactions of twelve children carrying out search tasks in a primary school…

  8. A novel adaptive Cuckoo search for optimal query plan generation.

    PubMed

    Gomathi, Ramalingam; Sharmila, Dhandapani

    2014-01-01

    The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.

  9. Searching for Images: The Analysis of Users' Queries for Image Retrieval in American History.

    ERIC Educational Resources Information Center

    Choi, Youngok; Rasmussen, Edie M.

    2003-01-01

    Studied users' queries for visual information in American history to identify the image attributes important for retrieval and the characteristics of users' queries for digital images, based on queries from 38 faculty and graduate students. Results of pre- and post-test questionnaires and interviews suggest principle categories of search terms.…

  10. Locality in Search Engine Queries and Its Implications for Caching

    DTIC Science & Technology

    2001-05-01

    in the question of whether caching might be effective for search engines as well. They study two real search engine traces by examining query...locality and its implications for caching. The two search engines studied are Vivisimo and Excite. Their trace analysis results show that queries have

  11. Cumulative query method for influenza surveillance using search engine data.

    PubMed

    Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il

    2014-12-16

    Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.

  12. Monitoring Influenza Epidemics in China with Search Query from Baidu

    PubMed Central

    Lv, Benfu; Peng, Geng; Chunara, Rumi; Brownstein, John S.

    2013-01-01

    Several approaches have been proposed for near real-time detection and prediction of the spread of influenza. These include search query data for influenza-related terms, which has been explored as a tool for augmenting traditional surveillance methods. In this paper, we present a method that uses Internet search query data from Baidu to model and monitor influenza activity in China. The objectives of the study are to present a comprehensive technique for: (i) keyword selection, (ii) keyword filtering, (iii) index composition and (iv) modeling and detection of influenza activity in China. Sequential time-series for the selected composite keyword index is significantly correlated with Chinese influenza case data. In addition, one-month ahead prediction of influenza cases for the first eight months of 2012 has a mean absolute percent error less than 11%. To our knowledge, this is the first study on the use of search query data from Baidu in conjunction with this approach for estimation of influenza activity in China. PMID:23750192

  13. Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries

    PubMed Central

    Theofanos, Mary Frances

    2003-01-01

    Background Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. Objective To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11

  14. Searching for cancer information on the internet: analyzing natural language search queries.

    PubMed

    Bader, Judith L; Theofanos, Mary Frances

    2003-12-11

    Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary

  15. Matching health information seekers' queries to medical terms

    PubMed Central

    2012-01-01

    Background The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. Methods In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. Results According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. Conclusions Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its

  16. Using search engine query data to track pharmaceutical utilization: a study of statins.

    PubMed

    Schuster, Nathaniel M; Rogers, Mary A M; McMahon, Laurence F

    2010-08-01

    To examine temporal and geographic associations between Google queries for health information and healthcare utilization benchmarks. Retrospective longitudinal study. Using Google Trends and Google Insights for Search data, the search terms Lipitor (atorvastatin calcium; Pfizer, Ann Arbor, MI) and simvastatin were evaluated for change over time and for association with Lipitor revenues. The relationship between query data and community-based resource use per Medicare beneficiary was assessed for 35 US metropolitan areas. Google queries for Lipitor significantly decreased from January 2004 through June 2009 and queries for simvastatin significantly increased (P <.001 for both), particularly after Lipitor came off patent (P <.001 for change in slope). The mean number of Google queries for Lipitor correlated (r = 0.98) with the percentage change in Lipitor global revenues from 2004 to 2008 (P <.001). Query preference for Lipitor over simvastatin was positively associated (r = 0.40) with a community's use of Medicare services. For every 1% increase in utilization of Medicare services in a community, there was a 0.2-unit increase in the ratio of Lipitor queries to simvastatin queries in that community (P = .02). Specific search engine queries for medical information correlate with pharmaceutical revenue and with overall healthcare utilization in a community. This suggests that search query data can track community-wide characteristics in healthcare utilization and have the potential for informing payers and policy makers regarding trends in utilization.

  17. Query Log Analysis of an Electronic Health Record Search Engine

    PubMed Central

    Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.

    2011-01-01

    We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150

  18. Project Lefty: More Bang for the Search Query

    ERIC Educational Resources Information Center

    Varnum, Ken

    2010-01-01

    This article describes the Project Lefty, a search system that, at a minimum, adds a layer on top of traditional federated search tools that will make the wait for results more worthwhile for researchers. At best, Project Lefty improves search queries and relevance rankings for web-scale discovery tools to make the results themselves more relevant…

  19. An Analysis of Web Image Queries for Search.

    ERIC Educational Resources Information Center

    Pu, Hsiao-Tieh

    2003-01-01

    Examines the differences between Web image and textual queries, and attempts to develop an analytic model to investigate their implications for Web image retrieval systems. Provides results that give insight into Web image searching behavior and suggests implications for improvement of current Web image search engines. (AEF)

  20. Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.

    PubMed

    De-Arteaga, Maria; Eggel, Ivan; Kahn, Charles E; Müller, Henning

    2015-10-01

    Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.

  1. Predicting Drug Recalls From Internet Search Engine Queries.

    PubMed

    Yom-Tov, Elad

    2017-01-01

    Batches of pharmaceuticals are sometimes recalled from the market when a safety issue or a defect is detected in specific production runs of a drug. Such problems are usually detected when patients or healthcare providers report abnormalities to medical authorities. Here, we test the hypothesis that defective production lots can be detected earlier by monitoring queries to Internet search engines. We extracted queries from the USA to the Bing search engine, which mentioned one of the 5195 pharmaceutical drugs during 2015 and all recall notifications issued by the Food and Drug Administration (FDA) during that year. By using attributes that quantify the change in query volume at the state level, we attempted to predict if a recall of a specific drug will be ordered by FDA in a time horizon ranging from 1 to 40 days in future. Our results show that future drug recalls can indeed be identified with an AUC of 0.791 and a lift at 5% of approximately 6 when predicting a recall occurring one day ahead. This performance degrades as prediction is made for longer periods ahead. The most indicative attributes for prediction are sudden spikes in query volume about a specific medicine in each state. Recalls of prescription drugs and those estimated to be of medium-risk are more likely to be identified using search query data. These findings suggest that aggregated Internet search engine data can be used to facilitate in early warning of faulty batches of medicines.

  2. Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea.

    PubMed

    Shin, Soo-Yong; Kim, Taerim; Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

    2016-01-01

    Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman's correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary.

  3. Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea

    PubMed Central

    Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

    2016-01-01

    Background Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. Methods and Results The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman’s correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Conclusion Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary. PMID:27391028

  4. Cognitive search model and a new query paradigm

    NASA Astrophysics Data System (ADS)

    Xu, Zhonghui

    2001-06-01

    This paper proposes a cognitive model in which people begin to search pictures by using semantic content and find a right picture by judging whether its visual content is a proper visualization of the semantics desired. It is essential that human search is not just a process of matching computation on visual feature but rather a process of visualization of the semantic content known. For people to search electronic images in the way as they manually do in the model, we suggest that querying be a semantic-driven process like design. A query-by-design paradigm is prosed in the sense that what you design is what you find. Unlike query-by-example, query-by-design allows users to specify the semantic content through an iterative and incremental interaction process so that a retrieval can start with association and identification of the given semantic content and get refined while further visual cues are available. An experimental image retrieval system, Kuafu, has been under development using the query-by-design paradigm and an iconic language is adopted.

  5. Index Compression and Efficient Query Processing in Large Web Search Engines

    ERIC Educational Resources Information Center

    Ding, Shuai

    2013-01-01

    The inverted index is the main data structure used by all the major search engines. Search engines build an inverted index on their collection to speed up query processing. As the size of the web grows, the length of the inverted list structures, which can easily grow to hundreds of MBs or even GBs for common terms (roughly linear in the size of…

  6. Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge.

    PubMed

    Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J

    2016-08-02

    Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.

  7. BEAUTY-X: enhanced BLAST searches for DNA queries.

    PubMed

    Worley, K C; Culpepper, P; Wiese, B A; Smith, R F

    1998-01-01

    BEAUTY (BLAST Enhanced Alignment Utility) is an enhanced version of the BLAST database search tool that facilitates identification of the functions of matched sequences. Three recent improvements to the BEAUTY program described here make the enhanced output (1) available for DNA queries, (2) available for searches of any protein database, and (3) more up-to-date, with periodic updates of the domain information. BEAUTY searches of the NCBI and EMBL non-redundant protein sequence databases are available from the BCM Search Launcher Web pages (http://gc.bcm.tmc. edu:8088/search-launcher/launcher.html). BEAUTY Post-Processing of submitted search results is available using the BCM Search Launcher Batch Client (version 2.6) (ftp://gc.bcm.tmc. edu/pub/software/search-launcher/). Example figures are available at http://dot.bcm.tmc. edu:9331/papers/beautypp.html (kworley,culpep)@bcm.tmc.edu

  8. Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search.

    PubMed

    Liu, Xianglong; Deng, Cheng; Lang, Bo; Tao, Dacheng; Li, Xuelong

    2016-02-01

    Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significantly outperform both

  9. RadSearch: a RIS/PACS integrated query tool

    NASA Astrophysics Data System (ADS)

    Tsao, Sinchai; Documet, Jorge; Moin, Paymann; Wang, Kevin; Liu, Brent J.

    2008-03-01

    Radiology Information Systems (RIS) contain a wealth of information that can be used for research, education, and practice management. However, the sheer amount of information available makes querying specific data difficult and time consuming. Previous work has shown that a clinical RIS database and its RIS text reports can be extracted, duplicated and indexed for searches while complying with HIPAA and IRB requirements. This project's intent is to provide a software tool, the RadSearch Toolkit, to allow intelligent indexing and parsing of RIS reports for easy yet powerful searches. In addition, the project aims to seamlessly query and retrieve associated images from the Picture Archiving and Communication System (PACS) in situations where an integrated RIS/PACS is in place - even subselecting individual series, such as in an MRI study. RadSearch's application of simple text parsing techniques to index text-based radiology reports will allow the search engine to quickly return relevant results. This powerful combination will be useful in both private practice and academic settings; administrators can easily obtain complex practice management information such as referral patterns; researchers can conduct retrospective studies with specific, multiple criteria; teaching institutions can quickly and effectively create thorough teaching files.

  10. Web Search Queries Can Predict Stock Market Volumes

    PubMed Central

    Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

    2012-01-01

    We live in a computerized and networked society where many of our actions leave a digital trace and affect other people’s actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www. PMID:22829871

  11. Web search queries can predict stock market volumes.

    PubMed

    Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

    2012-01-01

    We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.

  12. Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.

    PubMed

    Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai

    2017-03-01

    The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.

  13. Querying archetype-based EHRs by search ontology-based XPath engineering.

    PubMed

    Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich

    2018-05-11

    Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.

  14. Seasonal trends in sleep-disordered breathing: evidence from Internet search engine query data.

    PubMed

    Ingram, David G; Matthews, Camilla K; Plante, David T

    2015-03-01

    The primary aim of the current study was to test the hypothesis that there is a seasonal component to snoring and obstructive sleep apnea (OSA) through the use of Google search engine query data. Internet search engine query data were retrieved from Google Trends from January 2006 to December 2012. Monthly normalized search volume was obtained over that 7-year period in the USA and Australia for the following search terms: "snoring" and "sleep apnea". Seasonal effects were investigated by fitting cosinor regression models. In addition, the search terms "snoring children" and "sleep apnea children" were evaluated to examine seasonal effects in pediatric populations. Statistically significant seasonal effects were found using cosinor analysis in both USA and Australia for "snoring" (p < 0.00001 for both countries). Similarly, seasonal patterns were observed for "sleep apnea" in the USA (p = 0.001); however, cosinor analysis was not significant for this search term in Australia (p = 0.13). Seasonal patterns for "snoring children" and "sleep apnea children" were observed in the USA (p = 0.002 and p < 0.00001, respectively), with insufficient search volume to examine these search terms in Australia. All searches peaked in the winter or early spring in both countries, with the magnitude of seasonal effect ranging from 5 to 50 %. Our findings indicate that there are significant seasonal trends for both snoring and sleep apnea internet search engine queries, with a peak in the winter and early spring. Further research is indicated to determine the mechanisms underlying these findings, whether they have clinical impact, and if they are associated with other comorbid medical conditions that have similar patterns of seasonal exacerbation.

  15. Search query data to monitor interest in behavior change: application for public health.

    PubMed

    Carr, Lucas J; Dunsiger, Shira I

    2012-01-01

    There is a need for effective interventions and policies that target the leading preventable causes of death in the U.S. (e.g., smoking, overweight/obesity, physical inactivity). Such efforts could be aided by the use of publicly available, real-time search query data that illustrate times and locations of high and low public interest in behaviors related to preventable causes of death. This study explored patterns of search query activity for the terms 'weight', 'diet', 'fitness', and 'smoking' using Google Insights for Search. Search activity for 'weight', 'diet', 'fitness', and 'smoking' conducted within the United States via Google between January 4(th), 2004 (first date data was available) and November 28(th), 2011 (date of data download and analysis) were analyzed. Using a generalized linear model, we explored the effects of time (month) on mean relative search volume for all four terms. Models suggest a significant effect of month on mean search volume for all four terms. Search activity for all four terms was highest in January with observable declines throughout the remainder of the year. These findings demonstrate discernable temporal patterns of search activity for four areas of behavior change. These findings could be used to inform the timing, location and messaging of interventions, campaigns and policies targeting these behaviors.

  16. Search Query Data to Monitor Interest in Behavior Change: Application for Public Health

    PubMed Central

    Carr, Lucas J.; Dunsiger, Shira I.

    2012-01-01

    There is a need for effective interventions and policies that target the leading preventable causes of death in the U.S. (e.g., smoking, overweight/obesity, physical inactivity). Such efforts could be aided by the use of publicly available, real-time search query data that illustrate times and locations of high and low public interest in behaviors related to preventable causes of death. Objectives This study explored patterns of search query activity for the terms ‘weight’, ‘diet’, ‘fitness’, and ‘smoking’ using Google Insights for Search. Methods Search activity for ‘weight’, ‘diet’, ‘fitness’, and ‘smoking’ conducted within the United States via Google between January 4th, 2004 (first date data was available) and November 28th, 2011 (date of data download and analysis) were analyzed. Using a generalized linear model, we explored the effects of time (month) on mean relative search volume for all four terms. Results Models suggest a significant effect of month on mean search volume for all four terms. Search activity for all four terms was highest in January with observable declines throughout the remainder of the year. Conclusions These findings demonstrate discernable temporal patterns of search activity for four areas of behavior change. These findings could be used to inform the timing, location and messaging of interventions, campaigns and policies targeting these behaviors. PMID:23110198

  17. Can Google Trends search queries contribute to risk diversification?

    PubMed

    Kristoufek, Ladislav

    2013-01-01

    Portfolio diversification and active risk management are essential parts of financial analysis which became even more crucial (and questioned) during and after the years of the Global Financial Crisis. We propose a novel approach to portfolio diversification using the information of searched items on Google Trends. The diversification is based on an idea that popularity of a stock measured by search queries is correlated with the stock riskiness. We penalize the popular stocks by assigning them lower portfolio weights and we bring forward the less popular, or peripheral, stocks to decrease the total riskiness of the portfolio. Our results indicate that such strategy dominates both the benchmark index and the uniformly weighted portfolio both in-sample and out-of-sample.

  18. Can Google Trends search queries contribute to risk diversification?

    PubMed Central

    Kristoufek, Ladislav

    2013-01-01

    Portfolio diversification and active risk management are essential parts of financial analysis which became even more crucial (and questioned) during and after the years of the Global Financial Crisis. We propose a novel approach to portfolio diversification using the information of searched items on Google Trends. The diversification is based on an idea that popularity of a stock measured by search queries is correlated with the stock riskiness. We penalize the popular stocks by assigning them lower portfolio weights and we bring forward the less popular, or peripheral, stocks to decrease the total riskiness of the portfolio. Our results indicate that such strategy dominates both the benchmark index and the uniformly weighted portfolio both in-sample and out-of-sample. PMID:24048448

  19. Search Term Reports

    EPA Pesticide Factsheets

    Learn what search terms brought users to choose your page in their search results, and what terms they entered in the EPA search box after visiting your page. Use this information to improve links and content on the page.

  20. A Study on Pubmed Search Tag Usage Pattern: Association Rule Mining of a Full-day Pubmed Query Log

    PubMed Central

    2013-01-01

    Background The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. Methods A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. Results The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. Conclusions The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed’s Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE. PMID:23302604

  1. Seasonal trends in tinnitus symptomatology: evidence from Internet search engine query data.

    PubMed

    Plante, David T; Ingram, David G

    2015-10-01

    The primary aim of this study was to test the hypothesis that the symptom of tinnitus demonstrates a seasonal pattern with worsening in the winter relative to the summer using Internet search engine query data. Normalized search volume for the term 'tinnitus' from January 2004 through December 2013 was retrieved from Google Trends. Seasonal effects were evaluated using cosinor regression models. Primary countries of interest were the United States and Australia. Secondary exploratory analyses were also performed using data from Germany, the United Kingdom, Canada, Sweden, and Switzerland. Significant seasonal effects for 'tinnitus' search queries were found in the United States and Australia (p < 0.00001 for both countries), with peaks in the winter and troughs in the summer. Secondary analyses demonstrated similarly significant seasonal effects for Germany (p < 0.00001), Canada (p < 0.00001), and Sweden (p = 0.0008), again with increased search volume in the winter relative to the summer. Our findings indicate that there are significant seasonal trends for Internet search queries for tinnitus, with a zenith in winter months. Further research is indicated to determine the biological mechanisms underlying these findings, as they may provide insights into the pathophysiology of this common and debilitating medical symptom.

  2. Raising the IQ in full-text searching via intelligent querying

    SciTech Connect

    Kero, R.; Russell, L.; Swietlik, C.

    1994-11-01

    Current Information Retrieval (IR) technologies allow for efficient access to relevant information, provided that user selected query terms coincide with the specific linguistical choices made by the authors whose works constitute the text-base. Therefore, the challenge is to enhance the limited searching capability of state-of-the-practice IR. This can be done either with augmented clients that overcome current server searching deficiencies, or with added capabilities that can augment searching algorithms on the servers. The technology being investigated is that of deductive databases, with a set of new techniques called cooperative answering. This technology utilizes semantic networks to allow for navigation betweenmore » possible query search term alternatives. The augmented search terms are passed to an IR engine and the results can be compared. The project utilizes the OSTI Environment, Safety and Health Thesaurus to populate the domain specific semantic network and the text base of ES&H related documents from the Facility Profile Information Management System as the domain specific search space.« less

  3. Complex dynamics of our economic life on different scales: insights from search engine query data.

    PubMed

    Preis, Tobias; Reith, Daniel; Stanley, H Eugene

    2010-12-28

    Search engine query data deliver insight into the behaviour of individuals who are the smallest possible scale of our economic life. Individuals are submitting several hundred million search engine queries around the world each day. We study weekly search volume data for various search terms from 2004 to 2010 that are offered by the search engine Google for scientific use, providing information about our economic life on an aggregated collective level. We ask the question whether there is a link between search volume data and financial market fluctuations on a weekly time scale. Both collective 'swarm intelligence' of Internet users and the group of financial market participants can be regarded as a complex system of many interacting subunits that react quickly to external changes. We find clear evidence that weekly transaction volumes of S&P 500 companies are correlated with weekly search volume of corresponding company names. Furthermore, we apply a recently introduced method for quantifying complex correlations in time series with which we find a clear tendency that search volume time series and transaction volume time series show recurring patterns.

  4. From health search to healthcare: explorations of intention and utilization via query logs and user surveys

    PubMed Central

    White, Ryen W; Horvitz, Eric

    2014-01-01

    Objective To better understand the relationship between online health-seeking behaviors and in-world healthcare utilization (HU) by studies of online search and access activities before and after queries that pursue medical professionals and facilities. Materials and methods We analyzed data collected from logs of online searches gathered from consenting users of a browser toolbar from Microsoft (N=9740). We employed a complementary survey (N=489) to seek a deeper understanding of information-gathering, reflection, and action on the pursuit of professional healthcare. Results We provide insights about HU through the survey, breaking out its findings by different respondent marginalizations as appropriate. Observations made from search logs may be explained by trends observed in our survey responses, even though the user populations differ. Discussion The results provide insights about how users decide if and when to utilize healthcare resources, and how online health information seeking transitions to in-world HU. The findings from both the survey and the logs reveal behavioral patterns and suggest a strong relationship between search behavior and HU. Although the diversity of our survey respondents is limited and we cannot be certain that users visited medical facilities, we demonstrate that it may be possible to infer HU from long-term search behavior by the apparent influence that health concerns and professional advice have on search activity. Conclusions Our findings highlight different phases of online activities around queries pursuing professional healthcare facilities and services. We also show that it may be possible to infer HU from logs without tracking people's physical location, based on the effect of HU on pre- and post-HU search behavior. This allows search providers and others to develop more robust models of interests and preferences by modeling utilization rather than simply the intention to utilize that is expressed in search queries. PMID

  5. [On the seasonality of dermatoses: a retrospective analysis of search engine query data depending on the season].

    PubMed

    Köhler, M J; Springer, S; Kaatz, M

    2014-09-01

    The volume of search engine queries about disease-relevant items reflects public interest and correlates with disease prevalence as proven by the example of flu (influenza). Other influences include media attention or holidays. The present work investigates if the seasonality of prevalence or symptom severity of dermatoses correlates with search engine query data. The relative weekly volume of dermatological relevant search terms was assessed by the online tool Google Trends for the years 2009-2013. For each item, the degree of seasonality was calculated via frequency analysis and a geometric approach. Many dermatoses show a marked seasonality, reflected by search engine query volumes. Unexpected seasonal variations of these queries suggest a previously unknown variability of the respective disease prevalence. Furthermore, using the example of allergic rhinitis, a close correlation of search engine query data with actual pollen count can be demonstrated. In many cases, search engine query data are appropriate to estimate seasonal variability in prevalence of common dermatoses. This finding may be useful for real-time analysis and formation of hypotheses concerning pathogenetic or symptom aggravating mechanisms and may thus contribute to improvement of diagnostics and prevention of skin diseases.

  6. A study of medical and health queries to web search engines.

    PubMed

    Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk

    2004-03-01

    This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.

  7. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.

    PubMed

    Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-07-04

    As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using

  8. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

    PubMed Central

    Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-01-01

    Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In

  9. Analysis of queries sent to PubMed at the point of care: Observation of search behaviour in a medical teaching hospital

    PubMed Central

    Hoogendam, Arjen; Stalenhoef, Anton FH; Robbé, Pieter F de Vries; Overbeke, A John PM

    2008-01-01

    Background The use of PubMed to answer daily medical care questions is limited because it is challenging to retrieve a small set of relevant articles and time is restricted. Knowing what aspects of queries are likely to retrieve relevant articles can increase the effectiveness of PubMed searches. The objectives of our study were to identify queries that are likely to retrieve relevant articles by relating PubMed search techniques and tools to the number of articles retrieved and the selection of articles for further reading. Methods This was a prospective observational study of queries regarding patient-related problems sent to PubMed by residents and internists in internal medicine working in an Academic Medical Centre. We analyzed queries, search results, query tools (Mesh, Limits, wildcards, operators), selection of abstract and full-text for further reading, using a portal that mimics PubMed. Results PubMed was used to solve 1121 patient-related problems, resulting in 3205 distinct queries. Abstracts were viewed in 999 (31%) of these queries, and in 126 (39%) of 321 queries using query tools. The average term count per query was 2.5. Abstracts were selected in more than 40% of queries using four or five terms, increasing to 63% if the use of four or five terms yielded 2–161 articles. Conclusion Queries sent to PubMed by physicians at our hospital during daily medical care contain fewer than three terms. Queries using four to five terms, retrieving less than 161 article titles, are most likely to result in abstract viewing. PubMed search tools are used infrequently by our population and are less effective than the use of four or five terms. Methods to facilitate the formulation of precise queries, using more relevant terms, should be the focus of education and research. PMID:18816391

  10. Query Classification and Study of University Students' Search Trends

    ERIC Educational Resources Information Center

    Maabreh, Majdi A.; Al-Kabi, Mohammed N.; Alsmadi, Izzat M.

    2012-01-01

    Purpose: This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet. Design/methodology/approach: The web log files were collected from one of the higher…

  11. Revisiting the Rise of Electronic Nicotine Delivery Systems Using Search Query Surveillance

    PubMed Central

    Ayers, John W.; Althouse, Benjamin M.; Allem, Jon-Patrick; Leas, Eric C.; Dredze, Mark; Williams, Rebecca

    2016-01-01

    Introduction Public perceptions of electronic nicotine delivery systems (ENDS) remain poorly understood because surveys are too costly to regularly implement and when implemented there are large delays between data collection and dissemination. Search query surveillance has bridged some of these gaps. Herein, ENDS’ popularity in the U.S. is reassessed using Google searches. Methods ENDS searches originating in the U.S. from January 2009 through January 2015 were disaggregated by terms focused on e-cigarette (e.g., e-cig) versus vaping (e.g., vapers), their geolocation (e.g., state), the aggregate tobacco control measures corresponding to their geolocation (e.g., clean indoor air laws), and by terms that indicated the searcher’s potential interest (e.g., buy e-cigs likely indicates shopping); all analyzed in 2015. Results ENDS searches are increasing across the entire U.S., with 8,498,180 searches during 2014. At the same time, searches shifted from e-cigarette- to vaping-focused terms, especially in coastal states and states with more anti-smoking norms. For example, nationally, e-cigarette searches declined 9% (95% CI=1%, 16%) during 2014 compared with 2013, whereas vaping searches increased 136% (95% CI=97%, 186%), surpassing e-cigarette searches. More ENDS searches were related to shopping (e.g., vape shop) than health concerns (e.g., vaping risks) or cessation (e.g., quit smoking with e-cigs), with shopping searches nearly doubling during 2014. Conclusions ENDS popularity is rapidly growing and evolving, and monitoring searches has provided these timely insights. These findings may inform survey questionnaire development for follow-up investigation and immediately guide policy debates about how the public perceives ENDS’ health risks or cessation benefits. PMID:26876772

  12. Cognitive issues in searching images with visual queries

    NASA Astrophysics Data System (ADS)

    Yu, ByungGu; Evens, Martha W.

    1999-01-01

    In this paper, we propose our image indexing technique and visual query processing technique. Our mental images are different from the actual retinal images and many things, such as personal interests, personal experiences, perceptual context, the characteristics of spatial objects, and so on, affect our spatial perception. These private differences are propagated into our mental images and so our visual queries become different from the real images that we want to find. This is a hard problem and few people have tried to work on it. In this paper, we survey the human mental imagery system, the human spatial perception, and discuss several kinds of visual queries. Also, we propose our own approach to visual query interpretation and processing.

  13. Revisiting the Rise of Electronic Nicotine Delivery Systems Using Search Query Surveillance.

    PubMed

    Ayers, John W; Althouse, Benjamin M; Allem, Jon-Patrick; Leas, Eric C; Dredze, Mark; Williams, Rebecca S

    2016-06-01

    Public perceptions of electronic nicotine delivery systems (ENDS) remain poorly understood because surveys are too costly to regularly implement and, when implemented, there are long delays between data collection and dissemination. Search query surveillance has bridged some of these gaps. Herein, ENDS' popularity in the U.S. is reassessed using Google searches. ENDS searches originating in the U.S. from January 2009 through January 2015 were disaggregated by terms focused on e-cigarette (e.g., e-cig) versus vaping (e.g., vapers); their geolocation (e.g., state); the aggregate tobacco control measures corresponding to their geolocation (e.g., clean indoor air laws); and by terms that indicated the searcher's potential interest (e.g., buy e-cigs likely indicates shopping)-all analyzed in 2015. ENDS searches are rapidly increasing in the U.S., with 8,498,000 searches during 2014 alone. Increasingly, searches are shifting from e-cigarette- to vaping-focused terms, especially in coastal states and states where anti-smoking norms are stronger. For example, nationally, e-cigarette searches declined 9% (95% CI=1%, 16%) during 2014 compared with 2013, whereas vaping searches increased 136% (95% CI=97%, 186%), even surpassing e-cigarette searches. Additionally, the percentage of ENDS searches related to shopping (e.g., vape shop) nearly doubled in 2014, whereas searches related to health concerns (e.g., vaping risks) or cessation (e.g., quit smoking with e-cigs) were rare and declined in 2014. ENDS popularity is rapidly growing and evolving. These findings could inform survey questionnaire development for follow-up investigation and immediately guide policy debates about how the public perceives the health risks or cessation benefits of ENDS. Copyright © 2016 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  14. Google Search Queries About Neurosurgical Topics: Are They a Suitable Guide for Neurosurgeons?

    PubMed

    Lawson McLean, Anna C; Lawson McLean, Aaron; Kalff, Rolf; Walter, Jan

    2016-06-01

    Google is the most popular search engine, with about 100 billion searches per month. Google Trends is an integrated tool that allows users to obtain Google's search popularity statistics from the last decade. Our aim was to evaluate whether Google Trends is a useful tool to assess the public's interest in specific neurosurgical topics. We evaluated Google Trends statistics for the neurosurgical search topic areas "hydrocephalus," "spinal stenosis," "concussion," "vestibular schwannoma," and "cerebral arteriovenous malformation." We compared these with bibliometric data from PubMed and epidemiologic data from the German Federal Monitoring Agency. In addition, we assessed Google users' search behavior for the search terms "glioblastoma" and "meningioma." Over the last 10 years, there has been an increasing interest in the topic "concussion" from Internet users in general and scientists. "Spinal stenosis," "concussion," and "vestibular schwannoma" are topics that are of special interest in high-income countries (eg, Germany), whereas "hydrocephalus" is a popular topic in low- and middle-income countries. The Google-defined top searches within these topic areas revealed more detail about people's interests (eg, "normal pressure hydrocephalus" or "football concussion" ranked among the most popular search queries within the corresponding topics). There was a similar volume of queries for "glioblastoma" and "meningioma." Google Trends is a useful source to elicit information about general trends in peoples' health interests and the role of different diseases across the world. The Internet presence of neurosurgical units and surgeons can be guided by online users' interests to achieve high-quality, professional-endorsed patient education. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. System for Performing Single Query Searches of Heterogeneous and Dispersed Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)

    2017-01-01

    The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.

  16. Towards computational improvement of DNA database indexing and short DNA query searching.

    PubMed

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-09-03

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.

  17. Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services.

    PubMed

    Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong

    2016-01-01

    Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider.

  18. Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services

    PubMed Central

    Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong

    2016-01-01

    Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider. PMID:27571421

  19. Searching Databases without Query-Building Aids: Implications for Dyslexic Users

    ERIC Educational Resources Information Center

    Berget, Gerd; Sandnes, Frode Eika

    2015-01-01

    Introduction: Few studies document the information searching behaviour of users with cognitive impairments. This paper therefore addresses the effect of dyslexia on information searching in a database with no tolerance for spelling errors and no query-building aids. The purpose was to identify effective search interface design guidelines that…

  20. Manually Classifying User Search Queries on an Academic Library Web Site

    ERIC Educational Resources Information Center

    Chapman, Suzanne; Desai, Shevon; Hagedorn, Kat; Varnum, Ken; Mishra, Sonali; Piacentine, Julie

    2013-01-01

    The University of Michigan Library wanted to learn more about the kinds of searches its users were conducting through the "one search" search box on the Library Web site. Library staff conducted two investigations. A preliminary investigation in 2011 involved the manual review of the 100 most frequently occurring queries conducted…

  1. Seasonal trends in hypertension in Poland: evidence from Google search engine query data.

    PubMed

    Płatek, Anna E; Sierdziński, Janusz; Krzowski, Bartosz; Szymański, Filip M

    2018-01-01

    Various conditions, including arterial hypertension, exhibit seasonal trends in their occurrence and magnitude. Those trends correspond to an interest exhibited in the number of Internet searches for the specific conditions per month. The aim of the study was to show seasonal trends in the hypertension prevalence in Poland relate to the data from the Google Trends tool. Internet search engine query data were retrieved from Google Trends from January 2008 to November 2017. Data were calculated as a monthly normalised search volume from the nine-year period. Data was presented for specific geographic regions, including Poland, the United States of America, Australia, and worldwide for the following search terms: "arterial hypertension (pol. nadciśnienie tętnicze)", "hypertension (pol. nadciśnienie)" and "hypertension medical condition". Seasonal effects were calculated using regression models and presented graphically. In Poland the search volume is the highest between November and May, while patients exhibit the least interest in arterial hypertension during summer holidays (p < 0.05). Seasonal variations are comparable in the United States of America representing a Northern hemisphere country, while in Australia (Southern hemisphere) they exhibit a contrary trend. In conclusion, arterial hypertension is more likely to occur during winter months, which correlates with increased interest in the search phrase "hypertension" in Google.

  2. GO2PUB: Querying PubMed with semantic expansion of gene ontology terms

    PubMed Central

    2012-01-01

    Background With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants. Results GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts’ agreement was high (kappa = 0.88). GO2PUB returned 69% of the relevant articles, GoPubMed: 40% and PubMed: 29%. GO2PUB and GoPubMed have 17% of their results in common, corresponding to 24% of the total number of relevant results. 70% of the articles returned by more than one tool were relevant. 36% of the relevant articles were returned only by GO2PUB, 17% only by GoPubMed and 14% only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77% and 40% for the first queries, and of 70% and 38% for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances

  3. Design of an On-Line Query Language for Full Text Patent Search.

    ERIC Educational Resources Information Center

    Glantz, Richard S.

    The design of an English-like query language and an interactive computer environment for searching the full text of the U.S. patent collection are discussed. Special attention is paid to achieving a transparent user interface, to providing extremely broad search capabilities (including nested substitution classes, Kleene star events, and domain…

  4. A study on PubMed search tag usage pattern: association rule mining of a full-day PubMed query log.

    PubMed

    Mosa, Abu Saleh Mohammad; Yoo, Illhoi

    2013-01-09

    The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed's Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE.

  5. Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions

    NASA Technical Reports Server (NTRS)

    Lynnes, Chris; Beaumont, Bruce; Duerr, Ruth; Hua, Hook

    2009-01-01

    This slide presentation reviews a Space-time query system that has been developed to assist the user in finding Earth science data that fulfills the researchers needs. It reviews the reasons why finding Earth science data can be so difficult, and explains the workings of the Space-Time Query with OpenSearch and how this system can assist researchers in finding the required data, It also reviews the developments with client server systems.

  6. News trends and web search query of HIV/AIDS in Hong Kong

    PubMed Central

    Chiu, Alice P. Y.; Lin, Qianying

    2017-01-01

    Background The HIV epidemic in Hong Kong has worsened in recent years, with major contributions from high-risk subgroup of men who have sex with men (MSM). Internet use is prevalent among the majority of the local population, where they sought health information online. This study examines the impacts of HIV/AIDS and MSM news coverage on web search query in Hong Kong. Methods Relevant news coverage about HIV/AIDS and MSM from January 1st, 2004 to December 31st, 2014 was obtained from the WiseNews databse. News trends were created by computing the number of relevant articles by type, topic, place of origin and sub-populations. We then obtained relevant search volumes from Google and analysed causality between news trends and Google Trends using Granger Causality test and orthogonal impulse function. Results We found that editorial news has an impact on “HIV” Google searches on HIV, with the search term popularity peaking at an average of two weeks after the news are published. Similarly, editorial news has an impact on the frequency of “AIDS” searches two weeks after. MSM-related news trends have a more fluctuating impact on “MSM” Google searches, although the time lag varies anywhere from one week later to ten weeks later. Conclusions This infodemiological study shows that there is a positive impact of news trends on the online search behavior of HIV/AIDS or MSM-related issues for up to ten weeks after. Health promotional professionals could make use of this brief time window to tailor the timing of HIV awareness campaigns and public health interventions to maximise its reach and effectiveness. PMID:28922376

  7. News trends and web search query of HIV/AIDS in Hong Kong.

    PubMed

    Chiu, Alice P Y; Lin, Qianying; He, Daihai

    2017-01-01

    The HIV epidemic in Hong Kong has worsened in recent years, with major contributions from high-risk subgroup of men who have sex with men (MSM). Internet use is prevalent among the majority of the local population, where they sought health information online. This study examines the impacts of HIV/AIDS and MSM news coverage on web search query in Hong Kong. Relevant news coverage about HIV/AIDS and MSM from January 1st, 2004 to December 31st, 2014 was obtained from the WiseNews databse. News trends were created by computing the number of relevant articles by type, topic, place of origin and sub-populations. We then obtained relevant search volumes from Google and analysed causality between news trends and Google Trends using Granger Causality test and orthogonal impulse function. We found that editorial news has an impact on "HIV" Google searches on HIV, with the search term popularity peaking at an average of two weeks after the news are published. Similarly, editorial news has an impact on the frequency of "AIDS" searches two weeks after. MSM-related news trends have a more fluctuating impact on "MSM" Google searches, although the time lag varies anywhere from one week later to ten weeks later. This infodemiological study shows that there is a positive impact of news trends on the online search behavior of HIV/AIDS or MSM-related issues for up to ten weeks after. Health promotional professionals could make use of this brief time window to tailor the timing of HIV awareness campaigns and public health interventions to maximise its reach and effectiveness.

  8. Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions

    NASA Astrophysics Data System (ADS)

    Lynnes, C.; Beaumont, B.; Duerr, R. E.; Hua, H.

    2009-12-01

    The past decade has seen a burgeoning of remote sensing and Earth science data providers, as evidenced in the growth of the Earth Science Information Partner (ESIP) federation. At the same time, the need to combine diverse data sets to enable understanding of the Earth as a system has also grown. While the expansion of data providers is in general a boon to such studies, the diversity presents a challenge to finding useful data for a given study. Locating all the data files with aerosol information for a particular volcanic eruption, for example, may involve learning and using several different search tools to execute the requisite space-time queries. To address this issue, the ESIP federation is developing a federated space-time query framework, based on the OpenSearch convention (www.opensearch.org), with Geo and Time extensions. In this framework, data providers publish OpenSearch Description Documents that describe in a machine-readable form how to execute queries against the provider. The novelty of OpenSearch is that the space-time query interface becomes both machine callable and easy enough to integrate into the web browser's search box. This flexibility, together with a simple REST (HTTP-get) interface, should allow a variety of data providers to participate in the federated search framework, from large institutional data centers to individual scientists. The simple interface enables trivial querying of multiple data sources and participation in recursive-like federated searches--all using the same common OpenSearch interface. This simplicity also makes the construction of clients easy, as does existing OpenSearch client libraries in a variety of languages. Moreover, a number of clients and aggregation services already exist and OpenSearch is already supported by a number of web browsers such as Firefox and Internet Explorer.

  9. Internet search query analysis can be used to demonstrate the rapidly increasing public awareness of palliative care in the USA.

    PubMed

    McLean, Sarah; Lennon, Paul; Glare, Paul

    2017-01-27

    A lack of public awareness of palliative care (PC) has been identified as one of the main barriers to appropriate PC access. Internet search query analysis is a novel methodology, which has been effectively used in surveillance of infectious diseases, and can be used to monitor public awareness of health-related topics. We aimed to demonstrate the utility of internet search query analysis to evaluate changes in public awareness of PC in the USA between 2005 and 2015. Google Trends provides a referenced score for the popularity of a search term, for defined regions over defined time periods. The popularity of the search term 'palliative care' was measured monthly between 1/1/2005 and 31/12/2015 in the USA and in the UK. Results were analysed using independent t-tests and joinpoint analysis. The mean monthly popularity of the search term increased between 2008-2009 (p<0.001), 2011-2012 (p<0.001), 2013-2014 (p=0.004) and 2014-2015 (p=0.002) in the USA. Joinpoint analysis was used to evaluate the monthly percentage change (MPC) in the popularity of the search term. In the USA, the MPC increase was 0.6%/month (p<0.05); in the UK the MPC of 0.05% was non-significant. Although internet search query surveillance is a novel methodology, it is freely accessible and has significant potential to monitor health-seeking behaviour among the public. PC is rapidly growing in the USA, and the rapidly increasing public awareness of PC as demonstrated in this study, in comparison with the UK, where PC is relatively well established is encouraging in increasingly ensuring appropriate PC access for all. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  10. Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries.

    PubMed

    Yom-Tov, Elad; Lev-Ran, Shaul

    2017-10-26

    Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration's Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R 2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. ©Elad Yom-Tov, Shaul Lev-Ran. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 26.10.2017.

  11. Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries

    PubMed Central

    Lev-Ran, Shaul

    2017-01-01

    Background Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Objective Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. Methods We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration’s Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Results Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). Conclusions These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. PMID:29074469

  12. How to improve your PubMed/MEDLINE searches: 2. display settings, complex search queries and topic searching.

    PubMed

    Fatehi, Farhad; Gray, Leonard C; Wootton, Richard

    2014-01-01

    The way that PubMed results are displayed can be changed using the Display Settings drop-down menu in the result screen. There are three groups of options: Format, Items per page and Sort by, which allow a good deal of control. The results from several searches can be temporarily stored on the Clipboard. Records of interest can be selected on the results page using check boxes and can then be combined, for example to form a reference list. The Related Citations is a valuable feature of PubMed that can provide a set of similar articles when you have identified a record of interest among the results. You can easily search for RCTs or reviews using the appropriate filters or field tags. If you are interested in clinical articles, rather than basic science or health service research, then the Clinical Queries tool on the PubMed home page can be used to retrieve them.

  13. Do economic equality and generalized trust inhibit academic dishonesty? Evidence from state-level search-engine queries.

    PubMed

    Neville, Lukas

    2012-04-01

    What effect does economic inequality have on academic integrity? Using data from search-engine queries made between 2003 and 2011 on Google and state-level measures of income inequality and generalized trust, I found that academically dishonest searches (queries seeking term-paper mills and help with cheating) were more likely to come from states with higher income inequality and lower levels of generalized trust. These relations persisted even when controlling for contextual variables, such as average income and the number of colleges per capita. The relation between income inequality and academic dishonesty was fully mediated by generalized trust. When there is higher economic inequality, people are less likely to view one another as trustworthy. This lower generalized trust, in turn, is associated with a greater prevalence of academic dishonesty. These results might explain previous findings on the effectiveness of honor codes.

  14. The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems.

    ERIC Educational Resources Information Center

    Peat, Helen J.; Willett, Peter

    1991-01-01

    Identifies limitations in the use of term co-occurrence data as a basis for automatic query expansion in natural language document retrieval systems. The use of similarity coefficients to calculate the degree of similarity between pairs of terms is explained, and frequency and discriminatory characteristics for nearest neighbors of query terms are…

  15. Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold

    PubMed Central

    Li, Weizhong; Lopez, Rodrigo

    2017-01-01

    Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999

  16. Querying Event Sequences by Exact Match or Similarity Search: Design and Empirical Evaluation

    PubMed Central

    Wongsuphasawat, Krist; Plaisant, Catherine; Taieb-Maimon, Meirav; Shneiderman, Ben

    2012-01-01

    Specifying event sequence queries is challenging even for skilled computer professionals familiar with SQL. Most graphical user interfaces for database search use an exact match approach, which is often effective, but near misses may also be of interest. We describe a new similarity search interface, in which users specify a query by simply placing events on a blank timeline and retrieve a similarity-ranked list of results. Behind this user interface is a new similarity measure for event sequences which the users can customize by four decision criteria, enabling them to adjust the impact of missing, extra, or swapped events or the impact of time shifts. We describe a use case with Electronic Health Records based on our ongoing collaboration with hospital physicians. A controlled experiment with 18 participants compared exact match and similarity search interfaces. We report on the advantages and disadvantages of each interface and suggest a hybrid interface combining the best of both. PMID:22379286

  17. Tracking the rise in popularity of electronic nicotine delivery systems (electronic cigarettes) using search query surveillance.

    PubMed

    Ayers, John W; Ribisl, Kurt M; Brownstein, John S

    2011-04-01

    Public interest in electronic nicotine delivery systems (ENDS) is undocumented. By monitoring search queries, ENDS popularity and correlates of their popularity were assessed in Australia, Canada, the United Kingdom (UK), and the U.S. English-language Google searches conducted from January 2008 through September 2010 were compared to snus, nicotine replacement therapy (NRT), and Chantix® or Champix®. Searches for each week were scaled to the highest weekly search proportion (100), with lower values indicating the relative search proportion compared to the highest-proportion week (e.g., 50=50% of the highest observed proportion). Analyses were performed in 2010. From July 2008 through February 2010, ENDS searches increased in all nations studied except Australia, there an increase occurred more recently. By September 2010, ENDS searches were several-hundred-fold greater than searches for smoking alternatives in the UK and U.S., and were rivaling alternatives in Australia and Canada. Across nations, ENDS searches were highest in the U.S., followed by similar search intensity in Canada and the UK, with Australia having the fewest ENDS searches. Stronger tobacco control, created by clean indoor air laws, cigarette taxes, and anti-smoking populations, were associated with consistently higher levels of ENDS searches. The online popularity of ENDS has surpassed that of snus and NRTs, which have been on the market for far longer, and is quickly outpacing Chantix or Champix. In part, the association between ENDS's popularity and stronger tobacco control suggests ENDS are used to bypass, or quit in response to, smoking restrictions. Search query surveillance is a valuable, real-time, free, and public method to evaluate the diffusion of new health products. This method may be generalized to other behavioral, biological, informational, or psychological outcomes manifested on search engines. Copyright © 2011 American Journal of Preventive Medicine. Published by Elsevier Inc

  18. Privacy-Preserving Location-Based Query Using Location Indexes and Parallel Searching in Distributed Networks

    PubMed Central

    Liu, Lei; Zhao, Jing

    2014-01-01

    An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users. PMID:24790579

  19. Privacy-preserving location-based query using location indexes and parallel searching in distributed networks.

    PubMed

    Zhong, Cheng; Liu, Lei; Zhao, Jing

    2014-01-01

    An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users.

  20. Infodemiology of status epilepticus: A systematic validation of the Google Trends-based search queries.

    PubMed

    Bragazzi, Nicola Luigi; Bacigaluppi, Susanna; Robba, Chiara; Nardone, Raffaele; Trinka, Eugen; Brigo, Francesco

    2016-02-01

    People increasingly use Google looking for health-related information. We previously demonstrated that in English-speaking countries most people use this search engine to obtain information on status epilepticus (SE) definition, types/subtypes, and treatment. Now, we aimed at providing a quantitative analysis of SE-related web queries. This analysis represents an advancement, with respect to what was already previously discussed, in that the Google Trends (GT) algorithm has been further refined and correlational analyses have been carried out to validate the GT-based query volumes. Google Trends-based SE-related query volumes were well correlated with information concerning causes and pharmacological and nonpharmacological treatments. Google Trends can provide both researchers and clinicians with data on realities and contexts that are generally overlooked and underexplored by classic epidemiology. In this way, GT can foster new epidemiological studies in the field and can complement traditional epidemiological tools. Copyright © 2015 Elsevier Inc. All rights reserved.

  1. How popular is waterpipe tobacco smoking? Findings from internet search queries.

    PubMed

    Salloum, Ramzi G; Osman, Amira; Maziak, Wasim; Thrasher, James F

    2015-09-01

    Waterpipe tobacco smoking (WTS), a traditional tobacco consumption practice in the Middle East, is gaining popularity worldwide. Estimates of population-level interest in WTS over time are not documented. We assessed the popularity of WTS using World Wide Web search query results across four English-speaking countries. We analysed trends in Google search queries related to WTS, comparing these trends with those for electronic cigarettes between 2004 and 2013 in Australia, Canada, the UK and the USA. Weekly search volumes were reported as percentages relative to the week with the highest volume of searches. Web-based searches for WTS have increased steadily since 2004 in all four countries. Search volume for WTS was higher than for e-cigarettes in three of the four nations, with the highest volume in the USA. Online searches were primarily targeted at WTS products for home use, followed by searches for WTS cafés/lounges. Online demand for information on WTS-related products and venues is large and increasing. Given the rise in WTS popularity, increasing evidence of exposure-related harms, and relatively lax government regulation, WTS is a serious public health concern and could reach epidemic levels in Western societies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  2. How popular is waterpipe tobacco smoking? Findings from internet search queries

    PubMed Central

    Salloum, Ramzi G; Osman, Amira; Maziak, Wasim; Thrasher, James F

    2015-01-01

    Objectives Waterpipe tobacco smoking (WTS), a traditional tobacco consumption practice in the Middle East, is gaining popularity worldwide. Estimates of population-level interest in WTS over time are not documented. We assessed the popularity of WTS using World Wide Web search query results across four English-speaking countries. Methods We analysed trends in Google search queries related to WTS, comparing these trends with those for electronic cigarettes between 2004 and 2013 in Australia, Canada, the UK and the USA. Weekly search volumes were reported as percentages relative to the week with the highest volume of searches. Results Web-based searches for WTS have increased steadily since 2004 in all four countries. Search volume for WTS was higher than for e-cigarettes in three of the four nations, with the highest volume in the USA. Online searches were primarily targeted at WTS products for home use, followed by searches for WTS cafés/lounges. Conclusions Online demand for information on WTS-related products and venues is large and increasing. Given the rise in WTS popularity, increasing evidence of exposure-related harms, and relatively lax government regulation, WTS is a serious public health concern and could reach epidemic levels in Western societies. PMID:25052859

  3. Advances in nowcasting influenza-like illness rates using search query logs

    NASA Astrophysics Data System (ADS)

    Lampos, Vasileios; Miller, Andrew C.; Crossan, Steve; Stefansen, Christian

    2015-08-01

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  4. Advances in nowcasting influenza-like illness rates using search query logs.

    PubMed

    Lampos, Vasileios; Miller, Andrew C; Crossan, Steve; Stefansen, Christian

    2015-08-03

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  5. Query-Adaptive Hash Code Ranking for Large-Scale Multi-View Visual Search.

    PubMed

    Liu, Xianglong; Huang, Lei; Deng, Cheng; Lang, Bo; Tao, Dacheng

    2016-10-01

    Hash-based nearest neighbor search has become attractive in many applications. However, the quantization in hashing usually degenerates the discriminative power when using Hamming distance ranking. Besides, for large-scale visual search, existing hashing methods cannot directly support the efficient search over the data with multiple sources, and while the literature has shown that adaptively incorporating complementary information from diverse sources or views can significantly boost the search performance. To address the problems, this paper proposes a novel and generic approach to building multiple hash tables with multiple views and generating fine-grained ranking results at bitwise and tablewise levels. For each hash table, a query-adaptive bitwise weighting is introduced to alleviate the quantization loss by simultaneously exploiting the quality of hash functions and their complement for nearest neighbor search. From the tablewise aspect, multiple hash tables are built for different data views as a joint index, over which a query-specific rank fusion is proposed to rerank all results from the bitwise ranking by diffusing in a graph. Comprehensive experiments on image search over three well-known benchmarks show that the proposed method achieves up to 17.11% and 20.28% performance gains on single and multiple table search over the state-of-the-art methods.

  6. SeqWare Query Engine: storing and searching sequence data in the cloud.

    PubMed

    O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

    2010-12-21

    analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.

  7. SeqWare Query Engine: storing and searching sequence data in the cloud

    PubMed Central

    2010-01-01

    interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981

  8. Utility of Web search query data in testing theoretical assumptions about mephedrone.

    PubMed

    Kapitány-Fövény, Máté; Demetrovics, Zsolt

    2017-05-01

    With growing access to the Internet, people who use drugs and traffickers started to obtain information about novel psychoactive substances (NPS) via online platforms. This paper aims to analyze whether a decreasing Web interest in formerly banned substances-cocaine, heroin, and MDMA-and the legislative status of mephedrone predict Web interest about this NPS. Google Trends was used to measure changes of Web interest on cocaine, heroin, MDMA, and mephedrone. Google search results for mephedrone within the same time frame were analyzed and categorized. Web interest about classic drugs found to be more persistent. Regarding geographical distribution, location of Web searches for heroin and cocaine was less centralized. Illicit status of mephedrone was a negative predictor of its Web search query rates. The connection between mephedrone-related Web search rates and legislative status of this substance was significantly mediated by ecstasy-related Web search queries, the number of documentaries, and forum/blog entries about mephedrone. The results might provide support for the hypothesis that mephedrone's popularity was highly correlated with its legal status as well as it functioned as a potential substitute for MDMA. Google Trends was found to be a useful tool for testing theoretical assumptions about NPS. Copyright © 2017 John Wiley & Sons, Ltd.

  9. What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the Internet

    PubMed Central

    Eysenbach, G.; Kohler, Ch.

    2003-01-01

    While health information is often said to be the most sought after information on the web, empirical data on the actual frequency of health-related searches on the web are missing. In the present study we aimed to determine the prevalence of health-related searches on the web by analyzing search terms entered by people into popular search engines. We also made some preliminary attempts in qualitatively describing and classifying these searches. Occasional difficulties in determining what constitutes a “health-related” search led us to propose and validate a simple method to automatically classify a search string as “health-related”. This method is based on determining the proportion of pages on the web containing the search string and the word “health”, as a proportion of the total number of pages with the search string alone. Using human codings as gold standard we plotted a ROC curve and determined empirically that if this “co-occurance rate” is larger than 35%, the search string can be said to be health-related (sensitivity: 85.2%, specificity 80.4%). The results of our “human” codings of search queries determined that about 4.5% of all searches are “health-related”. We estimate that globally a minimum of 6.75 Million health-related searches are being conducted on the web every day, which is roughly the same number of searches that have been conducted on the NLM Medlars system in 1996 in a full year. PMID:14728167

  10. A Comparison of Query-by-Example Methods for Spoken Term Detection

    DTIC Science & Technology

    2009-09-01

    consistent “errors” between the in- dex and the query. Few query terms have more than one pro- nunciation (avg. 1.1 prons . per term), as a result, there is... pron lex. one dict entry (llr) 73.01 47.66 21.11 all dict entries (avg+llr) 73.99 48.16 20.92 all dict entries (max+llr) 74.27 48.26 20.93 Table 1

  11. Using Search Engine Query Data to Explore the Epidemiology of Common Gastrointestinal Symptoms.

    PubMed

    Hassid, Benjamin G; Day, Lukejohn W; Awad, Mohannad A; Sewell, Justin L; Osterberg, E Charles; Breyer, Benjamin N

    2017-03-01

    Internet searches are an increasingly used tool in medical research. To date, no studies have examined Google search data in relation to common gastrointestinal symptoms. The aim of this study was to compare trends in Internet search volume with clinical datasets for common gastrointestinal symptoms. Using Google Trends, we recorded relative changes in volume of searches related to dysphagia, vomiting, and diarrhea in the USA between January 2008 and January 2011. We queried the National Inpatient Sample (NIS) and the National Hospital Ambulatory Medical Care Survey (NHAMCS) during this time period and identified cases related to these symptoms. We assessed the correlation between Google Trends and these two clinical datasets, as well as examined seasonal variation trends. Changes to Google search volume for all three symptoms correlated significantly with changes to NIS output (dysphagia: r = 0.5, P = 0.002; diarrhea: r = 0.79, P < 0.001; vomiting: r = 0.76, P < 0.001). Both Google and NIS data showed that the prevalence of all three symptoms rose during the time period studied. On the other hand, the NHAMCS data trends during this time period did not correlate well with either the NIS or the Google data for any of the three symptoms studied. Both the NIS and Google data showed modest seasonal variation. Changes to the population burden of chronic GI symptoms may be tracked by monitoring changes to Google search engine query volume over time. These data demonstrate that the prevalence of common GI symptoms is rising over time.

  12. Can internet search queries be used for dengue fever surveillance in China?

    PubMed

    Guo, Pi; Wang, Li; Zhang, Yanhong; Luo, Ganfeng; Zhang, Yanting; Deng, Changyu; Zhang, Qin; Zhang, Qingying

    2017-10-01

    China experienced an unprecedented outbreak of dengue fever in 2014, and the number of cases reached the highest level over the past 25 years. Traditional sentinel surveillance systems of dengue fever in China have an obvious drawback that the average delay from receipt to dissemination of dengue case data is roughly 1-2 weeks. In order to exploit internet search queries to timely monitor dengue fever, we analyzed data of dengue incidence and Baidu search query from 31 provinces in mainland China during the period of January 2011 to December 2014. We found that there was a strong correlation between changes in people's online health-seeking behavior and dengue fever incidence. Our study represents the first attempt demonstrating a strong temporal and spatial correlation between internet search trends and dengue epidemics nationwide in China. The findings will help the government to strengthen the capacity of traditional surveillance systems for dengue fever. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  13. Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance

    PubMed Central

    Chan, Emily H.; Sahai, Vikram; Conrad, Corrie; Brownstein, John S.

    2011-01-01

    Background A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Methodology/Principal Findings Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Conclusions/Significance Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance. PMID:21647308

  14. Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.

    PubMed

    Chan, Emily H; Sahai, Vikram; Conrad, Corrie; Brownstein, John S

    2011-05-01

    A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.

  15. An end user evaluation of query formulation and results review tools in three medical meta-search engines.

    PubMed

    Leroy, Gondy; Xu, Jennifer; Chung, Wingyan; Eggers, Shauna; Chen, Hsinchun

    2007-01-01

    Retrieving sufficient relevant information online is difficult for many people because they use too few keywords to search and search engines do not provide many support tools. To further complicate the search, users often ignore support tools when available. Our goal is to evaluate in a realistic setting when users use support tools and how they perceive these tools. We compared three medical search engines with support tools that require more or less effort from users to form a query and evaluate results. We carried out an end user study with 23 users who were asked to find information, i.e., subtopics and supporting abstracts, for a given theme. We used a balanced within-subjects design and report on the effectiveness, efficiency and usability of the support tools from the end user perspective. We found significant differences in efficiency but did not find significant differences in effectiveness between the three search engines. Dynamic user support tools requiring less effort led to higher efficiency. Fewer searches were needed and more documents were found per search when both query reformulation and result review tools dynamically adjust to the user query. The query reformulation tool that provided a long list of keywords, dynamically adjusted to the user query, was used most often and led to more subtopics. As hypothesized, the dynamic result review tools were used more often and led to more subtopics than static ones. These results were corroborated by the usability questionnaires, which showed that support tools that dynamically optimize output were preferred.

  16. SAM: String-based sequence search algorithm for mitochondrial DNA database queries

    PubMed Central

    Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther

    2011-01-01

    The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). PMID:21056022

  17. Scoping review on search queries and social media for disease surveillance: a chronology of innovation.

    PubMed

    Bernardo, Theresa Marie; Rajic, Andrijana; Young, Ian; Robiadek, Katie; Pham, Mai T; Funk, Julie A

    2013-07-18

    The threat of a global pandemic posed by outbreaks of influenza H5N1 (1997) and Severe Acute Respiratory Syndrome (SARS, 2002), both diseases of zoonotic origin, provoked interest in improving early warning systems and reinforced the need for combining data from different sources. It led to the use of search query data from search engines such as Google and Yahoo! as an indicator of when and where influenza was occurring. This methodology has subsequently been extended to other diseases and has led to experimentation with new types of social media for disease surveillance. The objective of this scoping review was to formally assess the current state of knowledge regarding the use of search queries and social media for disease surveillance in order to inform future work on early detection and more effective mitigation of the effects of foodborne illness. Structured scoping review methods were used to identify, characterize, and evaluate all published primary research, expert review, and commentary articles regarding the use of social media in surveillance of infectious diseases from 2002-2011. Thirty-two primary research articles and 19 reviews and case studies were identified as relevant. Most relevant citations were peer-reviewed journal articles (29/32, 91%) published in 2010-11 (28/32, 88%) and reported use of a Google program for surveillance of influenza. Only four primary research articles investigated social media in the context of foodborne disease or gastroenteritis. Most authors (21/32 articles, 66%) reported that social media-based surveillance had comparable performance when compared to an existing surveillance program. The most commonly reported strengths of social media surveillance programs included their effectiveness (21/32, 66%) and rapid detection of disease (21/32, 66%). The most commonly reported weaknesses were the potential for false positive (16/32, 50%) and false negative (11/32, 34%) results. Most authors (24/32, 75%) recommended that

  18. Scoping Review on Search Queries and Social Media for Disease Surveillance: A Chronology of Innovation

    PubMed Central

    Rajic, Andrijana; Young, Ian; Robiadek, Katie; Pham, Mai T; Funk, Julie A

    2013-01-01

    Background The threat of a global pandemic posed by outbreaks of influenza H5N1 (1997) and Severe Acute Respiratory Syndrome (SARS, 2002), both diseases of zoonotic origin, provoked interest in improving early warning systems and reinforced the need for combining data from different sources. It led to the use of search query data from search engines such as Google and Yahoo! as an indicator of when and where influenza was occurring. This methodology has subsequently been extended to other diseases and has led to experimentation with new types of social media for disease surveillance. Objective The objective of this scoping review was to formally assess the current state of knowledge regarding the use of search queries and social media for disease surveillance in order to inform future work on early detection and more effective mitigation of the effects of foodborne illness. Methods Structured scoping review methods were used to identify, characterize, and evaluate all published primary research, expert review, and commentary articles regarding the use of social media in surveillance of infectious diseases from 2002-2011. Results Thirty-two primary research articles and 19 reviews and case studies were identified as relevant. Most relevant citations were peer-reviewed journal articles (29/32, 91%) published in 2010-11 (28/32, 88%) and reported use of a Google program for surveillance of influenza. Only four primary research articles investigated social media in the context of foodborne disease or gastroenteritis. Most authors (21/32 articles, 66%) reported that social media-based surveillance had comparable performance when compared to an existing surveillance program. The most commonly reported strengths of social media surveillance programs included their effectiveness (21/32, 66%) and rapid detection of disease (21/32, 66%). The most commonly reported weaknesses were the potential for false positive (16/32, 50%) and false negative (11/32, 34%) results. Most

  19. Forecasting influenza in Hong Kong with Google search queries and statistical model fusion

    PubMed Central

    Ramirez Ramirez, L. Leticia; Nezafati, Kusha; Zhang, Qingpeng; Tsui, Kwok-Leung

    2017-01-01

    Background The objective of this study is to investigate predictive utility of online social media and web search queries, particularly, Google search data, to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement (i.e., fickle media interest) and other artifacts of online social media data, in our approach we fuse multiple offline and online data sources. Methods Four individual models: generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), autoregressive integrated moving average (ARIMA), and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. To our knowledge, this is the first study that introduces deep learning methodology into surveillance of infectious diseases and investigates its predictive utility. Furthermore, to exploit the strength from each individual forecasting models, we use statistical model fusion, using Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. For each model, an adaptive approach is used to capture the recent relationship between ILI and covariates. Results DL with FNN appears to deliver the most competitive predictive performance among the four considered individual models. Combing all four models in a comprehensive BMA framework allows to further improve such predictive evaluation metrics as root mean squared error (RMSE) and mean absolute predictive error (MAPE). Nevertheless, DL with FNN remains the preferred method for predicting locations of influenza peaks. Conclusions The proposed approach can be viewed a feasible alternative to forecast ILI in Hong Kong or other countries where ILI has no constant seasonal trend and influenza data resources are limited. The

  20. Forecasting influenza in Hong Kong with Google search queries and statistical model fusion.

    PubMed

    Xu, Qinneng; Gel, Yulia R; Ramirez Ramirez, L Leticia; Nezafati, Kusha; Zhang, Qingpeng; Tsui, Kwok-Leung

    2017-01-01

    The objective of this study is to investigate predictive utility of online social media and web search queries, particularly, Google search data, to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement (i.e., fickle media interest) and other artifacts of online social media data, in our approach we fuse multiple offline and online data sources. Four individual models: generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), autoregressive integrated moving average (ARIMA), and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. To our knowledge, this is the first study that introduces deep learning methodology into surveillance of infectious diseases and investigates its predictive utility. Furthermore, to exploit the strength from each individual forecasting models, we use statistical model fusion, using Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. For each model, an adaptive approach is used to capture the recent relationship between ILI and covariates. DL with FNN appears to deliver the most competitive predictive performance among the four considered individual models. Combing all four models in a comprehensive BMA framework allows to further improve such predictive evaluation metrics as root mean squared error (RMSE) and mean absolute predictive error (MAPE). Nevertheless, DL with FNN remains the preferred method for predicting locations of influenza peaks. The proposed approach can be viewed a feasible alternative to forecast ILI in Hong Kong or other countries where ILI has no constant seasonal trend and influenza data resources are limited. The proposed methodology is easily tractable

  1. Research on Web Search Behavior: How Online Query Data Inform Social Psychology.

    PubMed

    Lai, Kaisheng; Lee, Yan Xin; Chen, Hao; Yu, Rongjun

    2017-10-01

    The widespread use of web searches in daily life has allowed researchers to study people's online social and psychological behavior. Using web search data has advantages in terms of data objectivity, ecological validity, temporal resolution, and unique application value. This review integrates existing studies on web search data that have explored topics including sexual behavior, suicidal behavior, mental health, social prejudice, social inequality, public responses to policies, and other psychosocial issues. These studies are categorized as descriptive, correlational, inferential, predictive, and policy evaluation research. The integration of theory-based hypothesis testing in future web search research will result in even stronger contributions to social psychology.

  2. Personalized query suggestion based on user behavior

    NASA Astrophysics Data System (ADS)

    Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui

    Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.

  3. Using digital surveillance to examine the impact of public figure pancreatic cancer announcements on media and search query outcomes.

    PubMed

    Noar, Seth M; Ribisl, Kurt M; Althouse, Benjamin M; Willoughby, Jessica Fitts; Ayers, John W

    2013-12-01

    Announcements of cancer diagnoses from public figures may stimulate cancer information seeking and media coverage about cancer. This study used digital surveillance to quantify the effects of pancreatic cancer public figure announcements on online cancer information seeking and cancer media coverage. We compiled a list of public figures (N = 25) who had been diagnosed with or had died from pancreatic cancer between 2006 and 2011. We specified interrupted time series models using data from Google Trends to examine search query shifts for pancreatic cancer and other cancers. Weekly media coverage archived on Google News were also analyzed. Most public figures' pancreatic cancer announcements corresponded with no appreciable change in pancreatic cancer search queries or media coverage. In contrast, Patrick Swayze's diagnosis was associated with a 285% (95% confidence interval [CI]: 212 to 360) increase in pancreatic cancer search queries, though it was only weakly associated with increases in pancreatic cancer media coverage. Steve Jobs's death was associated with a 197% (95% CI: 131 to 266) increase in pancreatic cancer queries and a 3517% (95% CI: 2882 to 4492) increase in pancreatic cancer media coverage. In general, a doubling in pancreatic cancer-specific media coverage corresponded with a 325% increase in pancreatic cancer queries. Digital surveillance is an important tool for future cancer control research and practice. The current application of these methods suggested that pancreatic cancer announcements (diagnosis or death) by particular public figures stimulated media coverage of and online information seeking for pancreatic cancer.

  4. Examining the Relationship Between Past Orientation and US Suicide Rates: An Analysis Using Big Data-Driven Google Search Queries

    PubMed Central

    Lee, Donghyun; Lee, Hojun

    2016-01-01

    Background Internet search query data reflect the attitudes of the users, using which we can measure the past orientation to commit suicide. Examinations of past orientation often highlight certain predispositions of attitude, many of which can be suicide risk factors. Objective To investigate the relationship between past orientation and suicide rate by examining Google search queries. Methods We measured the past orientation using Google search query data by comparing the search volumes of the past year and those of the future year, across the 50 US states and the District of Columbia during the period from 2004 to 2012. We constructed a panel dataset with independent variables as control variables; we then undertook an analysis using multiple ordinary least squares regression and methods that leverage the Akaike information criterion and the Bayesian information criterion. Results It was found that past orientation had a positive relationship with the suicide rate (P≤.001) and that it improves the goodness-of-fit of the model regarding the suicide rate. Unemployment rate (P≤.001 in Models 3 and 4), Gini coefficient (P≤.001), and population growth rate (P≤.001) had a positive relationship with the suicide rate, whereas the gross state product (P≤.001) showed a negative relationship with the suicide rate. Conclusions We empirically identified the positive relationship between the suicide rate and past orientation, which was measured by big data-driven Google search query. PMID:26868917

  5. Examining the Relationship Between Past Orientation and US Suicide Rates: An Analysis Using Big Data-Driven Google Search Queries.

    PubMed

    Lee, Donghyun; Lee, Hojun; Choi, Munkee

    2016-02-11

    Internet search query data reflect the attitudes of the users, using which we can measure the past orientation to commit suicide. Examinations of past orientation often highlight certain predispositions of attitude, many of which can be suicide risk factors. To investigate the relationship between past orientation and suicide rate by examining Google search queries. We measured the past orientation using Google search query data by comparing the search volumes of the past year and those of the future year, across the 50 US states and the District of Columbia during the period from 2004 to 2012. We constructed a panel dataset with independent variables as control variables; we then undertook an analysis using multiple ordinary least squares regression and methods that leverage the Akaike information criterion and the Bayesian information criterion. It was found that past orientation had a positive relationship with the suicide rate (P ≤ .001) and that it improves the goodness-of-fit of the model regarding the suicide rate. Unemployment rate (P ≤ .001 in Models 3 and 4), Gini coefficient (P ≤ .001), and population growth rate (P ≤ .001) had a positive relationship with the suicide rate, whereas the gross state product (P ≤ .001) showed a negative relationship with the suicide rate. We empirically identified the positive relationship between the suicide rate and past orientation, which was measured by big data-driven Google search query.

  6. Automatic Concept-Based Query Expansion Using Term Relational Pathways Built from a Collection-Specific Association Thesaurus

    ERIC Educational Resources Information Center

    Lyall-Wilson, Jennifer Rae

    2013-01-01

    The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of…

  7. Google Analytics Reports about Search Terms

    EPA Pesticide Factsheets

    Learn what search terms brought users to choose your page in their search results, and what terms they entered in the EPA search box after visiting your page. Use this information to improve links and content on the page.

  8. Refining search terms for nanotechnology

    NASA Astrophysics Data System (ADS)

    Porter, Alan L.; Youtie, Jan; Shapira, Philip; Schoeneck, David J.

    2008-05-01

    The ability to delineate the boundaries of an emerging technology is central to obtaining an understanding of the technology's research paths and commercialization prospects. Nowhere is this more relevant than in the case of nanotechnology (hereafter identified as "nano") given its current rapid growth and multidisciplinary nature. (Under the rubric of nanotechnology, we also include nanoscience and nanoengineering.) Past efforts have utilized several strategies, including simple term search for the prefix nano, complex lexical and citation-based approaches, and bootstrapping techniques. This research introduces a modularized Boolean approach to defining nanotechnology which has been applied to several research and patenting databases. We explain our approach to downloading and cleaning data, and report initial results. Comparisons of this approach with other nanotechnology search formulations are presented. Implications for search strategy development and profiling of the nanotechnology field are discussed.

  9. Postmarket Drug Surveillance Without Trial Costs: Discovery of Adverse Drug Reactions Through Large-Scale Analysis of Web Search Queries

    PubMed Central

    Gabrilovich, Evgeniy

    2013-01-01

    Background Postmarket drug safety surveillance largely depends on spontaneous reports by patients and health care providers; hence, less common adverse drug reactions—especially those caused by long-term exposure, multidrug treatments, or those specific to special populations—often elude discovery. Objective Here we propose a low cost, fully automated method for continuous monitoring of adverse drug reactions in single drugs and in combinations thereof, and demonstrate the discovery of heretofore-unknown ones. Methods We used aggregated search data of large populations of Internet users to extract information related to drugs and adverse reactions to them, and correlated these data over time. We further extended our method to identify adverse reactions to combinations of drugs. Results We validated our method by showing high correlations of our findings with known adverse drug reactions (ADRs). However, although acute early-onset drug reactions are more likely to be reported to regulatory agencies, we show that less acute later-onset ones are better captured in Web search queries. Conclusions Our method is advantageous in identifying previously unknown adverse drug reactions. These ADRs should be considered as candidates for further scrutiny by medical regulatory authorities, for example, through phase 4 trials. PMID:23778053

  10. Forecasting influenza outbreak dynamics in Melbourne from Internet search query surveillance data.

    PubMed

    Moss, Robert; Zarebski, Alexander; Dawson, Peter; McCaw, James M

    2016-07-01

    Accurate forecasting of seasonal influenza epidemics is of great concern to healthcare providers in temperate climates, as these epidemics vary substantially in their size, timing and duration from year to year, making it a challenge to deliver timely and proportionate responses. Previous studies have shown that Bayesian estimation techniques can accurately predict when an influenza epidemic will peak many weeks in advance, using existing surveillance data, but these methods must be tailored both to the target population and to the surveillance system. Our aim was to evaluate whether forecasts of similar accuracy could be obtained for metropolitan Melbourne (Australia). We used the bootstrap particle filter and a mechanistic infection model to generate epidemic forecasts for metropolitan Melbourne (Australia) from weekly Internet search query surveillance data reported by Google Flu Trends for 2006-14. Optimal observation models were selected from hundreds of candidates using a novel approach that treats forecasts akin to receiver operating characteristic (ROC) curves. We show that the timing of the epidemic peak can be accurately predicted 4-6 weeks in advance, but that the magnitude of the epidemic peak and the overall burden are much harder to predict. We then discuss how the infection and observation models and the filtering process may be refined to improve forecast robustness, thereby improving the utility of these methods for healthcare decision support. © 2016 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.

  11. Monitoring hand, foot and mouth disease by combining search engine query data and meteorological factors.

    PubMed

    Huang, Da-Cang; Wang, Jin-Feng

    2018-01-15

    Hand, foot and mouth disease (HFMD) has been recognized as a significant public health threat and poses a tremendous challenge to disease control departments. To date, the relationship between meteorological factors and HFMD has been documented, and public interest of disease has been proven to be trackable from the Internet. However, no study has explored the combination of these two factors in the monitoring of HFMD. Therefore, the main aim of this study was to develop an effective monitoring model of HFMD in Guangzhou, China by utilizing historical HFMD cases, Internet-based search engine query data and meteorological factors. To this end, a case study was conducted in Guangzhou, using a network-based generalized additive model (GAM) including all factors related to HFMD. Three other models were also constructed using some of the variables for comparison. The results suggested that the model showed the best estimating ability when considering all of the related factors. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. SPLICE: A program to assemble partial query solutions from three-dimensional database searches into novel ligands

    NASA Astrophysics Data System (ADS)

    Ho, Chris M. W.; Marshall, Garland R.

    1993-12-01

    SPLICE is a program that processes partial query solutions retrieved from 3D, structural databases to generate novel, aggregate ligands. It is designed to interface with the database searching program FOUNDATION, which retrieves fragments containing any combination of a user-specified minimum number of matching query elements. SPLICE eliminates aspects of structures that are physically incapable of binding within the active site. Then, a systematic rule-based procedure is performed upon the remaining fragments to ensure receptor complementarity. All modifications are automated and remain transparent to the user. Ligands are then assembled by linking components into composite structures through overlapping bonds. As a control experiment, FOUNDATION and SPLICE were used to reconstruct a know HIV-1 protease inhibitor after it had been fragmented, reoriented, and added to a sham database of fifty different small molecules. To illustrate the capabilities of this program, a 3D search query containing the pharmacophoric elements of an aspartic proteinase-inhibitor crystal complex was searched using FOUNDATION against a subset of the Cambridge Structural Database. One hundred thirty-one compounds were retrieved, each containing any combination of at least four query elements. Compounds were automatically screened and edited for receptor complementarity. Numerous combinations of fragments were discovered that could be linked to form novel structures, containing a greater number of pharmacophoric elements than any single retrieved fragment.

  13. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration

    PubMed Central

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-01

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. PMID:27733503

  14. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration.

    PubMed

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-04

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Detecting internet activity for erectile dysfunction using search engine query data in the Republic of Ireland.

    PubMed

    Davis, Niall F; Smyth, Lisa G; Flood, Hugh D

    2012-12-01

    What's known on the subject? and What does the study add? Despite the increasing prevalence of erectile dysfunction (ED), there is reluctance among symptomatic patients to present to healthcare providers for appropriate advice and treatment. A number of Internet campaigns have been launched by the Irish healthcare media since 2007 aiming to provide easily accessible advice on ED. Novel online technologies appear to provide a useful tool for educating the general public on the symptoms of ED because there has been a significant increase in overall Internet search activity for this term since 2007. • To assess Internet search trends for erectile dysfunction (ED) subsequent to public awareness campaigns being launched within the Republic of Ireland • To assess whether the advent of such campaigns correlates with increased Internet search activity for ED. • Google insights for search was utilized to examine Internet search trends for the term 'erectile dysfunction' across all categories between January 2005 and December 2011. • Search activity was limited to users from the Republic of Ireland within this timeframe. • Additionally, the number of Irish Internet media campaigns and Irish web pages providing information on ED was assessed between January 2005 and December 2011. • Statistical analysis of the data was performed using analysis of variance and Student's t-tests for pairwise comparisons. • There has been a significant increase in mean search activity for ED on an annual basis since 2007 (P < 0.001). • The number of Irish web pages associated with information on ED has also increased significantly on an annual basis since 2007 (P < 0.001). • There have been seven different Irish Internet media campaigns on ED since 2007 compared to two from 2005 to 2007 (P < 0.001). • There was no significant change in mean search activity for ED from 2005 to 2007 • The advent of recent Internet media campaigns and increasing number of Irish web pages is

  16. Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China.

    PubMed

    Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao

    2017-10-06

    Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Ecological study. Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011-2014. Analyses were conducted at aggregate level and no confidential information was involved. A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. A high correlation between HFMD incidence and BDI ( r =0.794, p<0.001) or temperature ( r =0.657, p<0.001) was observed using both time series plot and correlation matrix. A linear effect of BDI (without lag) and non-linear effect of temperature (1 week lag) on HFMD incidence were found in a distributed lag non-linear model. Compared with the model based on surveillance data only, the ARIMAX model including BDI reached the best goodness-of-fit with an Akaike information criterion (AIC) value of -345.332, whereas the model including both BDI and temperature had the most accurate prediction in terms of the mean absolute percentage error (MAPE) of 101.745%. An ARIMAX model incorporating search engine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of other infectious diseases in other settings. © Article author(s) (or their

  17. Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China

    PubMed Central

    Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao

    2017-01-01

    Objectives Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Design Ecological study. Setting and participants Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011–2014. Analyses were conducted at aggregate level and no confidential information was involved. Outcome measures A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. Results A high correlation between HFMD incidence and BDI (r=0.794, p<0.001) or temperature (r=0.657, p<0.001) was observed using both time series plot and correlation matrix. A linear effect of BDI (without lag) and non-linear effect of temperature (1 week lag) on HFMD incidence were found in a distributed lag non-linear model. Compared with the model based on surveillance data only, the ARIMAX model including BDI reached the best goodness-of-fit with an Akaike information criterion (AIC) value of −345.332, whereas the model including both BDI and temperature had the most accurate prediction in terms of the mean absolute percentage error (MAPE) of 101.745%. Conclusions An ARIMAX model incorporating search engine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of

  18. Query Transformations for Result Merging

    DTIC Science & Technology

    2014-11-01

    tors, term dependence, query expansion 1. INTRODUCTION Federated search deals with the problem of aggregating results from multiple search engines . The...invidual search engines are (i) typically focused on a particular domain or a particular corpus, (ii) employ diverse retrieval models, and (iii...determine which search engines are appropri- ate for addressing the information need (resource selection), and (ii) merging the results returned by

  19. Influence of legislations and news on Indian internet search query patterns of e-cigarettes.

    PubMed

    Thavarajah, Rooban; Mohandoss, Anusa Arunachalam; Ranganathan, Kannan; Kondalsamy-Chennakesavan, Srinivas

    2017-01-01

    There is a paucity of data on the use of electronic nicotine delivery systems (ENDS) in India. In addition, the Indian internet search pattern for ENDS has not been studied. We aimed to address this lacuna. Moreover, the influence of the tobacco legislations and news pieces on such search volume is not known. Given the fact that ENDS could cause oral lesions, these data are pertinent to dentists. Using a time series analysis, we examined the effect of tobacco-related legislations and news pieces on total search volume (TSV) from September 1, 2012, to August 31, 2016. TSV data were seasonally adjusted and analyzed using time series modeling. The TSV clocked during the month of legislations and news pieces were analyzed for their influence on search pattern of ENDS. The overall mean ± standard deviation (range) TSV was 22273.75 ± 6784.01 (12310-40510) during the study with seasonal variations. Individually, the best model for TSV-legislation and news pieces was autoregressive integrated moving average model, and when influence of legislations and news events were combined, it was the Winter's additive model. In the legislation alone model, the pre-event, event and post-event month TSV was not a better indicator of the effect, barring for post-event month of 2 nd legislation, which involved pictorial warnings on packages in the study period. Similarly, a news piece on Pan-India ban on ENDS influenced the model in the news piece model. When combined, no "events" emerged significant. These findings suggest that search for information on ENDS is increasing and that these tobacco control policies and news items, targeting tobacco usage reduction, have only a short-term effect on the rate of searching for information on ENDS.

  20. Using internet search queries for infectious disease surveillance: screening diseases for suitability.

    PubMed

    Milinovich, Gabriel J; Avril, Simon M R; Clements, Archie C A; Brownstein, John S; Tong, Shilu; Hu, Wenbiao

    2014-12-31

    Internet-based surveillance systems provide a novel approach to monitoring infectious diseases. Surveillance systems built on internet data are economically, logistically and epidemiologically appealing and have shown significant promise. The potential for these systems has increased with increased internet availability and shifts in health-related information seeking behaviour. This approach to monitoring infectious diseases has, however, only been applied to single or small groups of select diseases. This study aims to systematically investigate the potential for developing surveillance and early warning systems using internet search data, for a wide range of infectious diseases. Official notifications for 64 infectious diseases in Australia were downloaded and correlated with frequencies for 164 internet search terms for the period 2009-13 using Spearman's rank correlations. Time series cross correlations were performed to assess the potential for search terms to be used in construction of early warning systems. Notifications for 17 infectious diseases (26.6%) were found to be significantly correlated with a selected search term. The use of internet metrics as a means of surveillance has not previously been described for 12 (70.6%) of these diseases. The majority of diseases identified were vaccine-preventable, vector-borne or sexually transmissible; cross correlations, however, indicated that vector-borne and vaccine preventable diseases are best suited for development of early warning systems. The findings of this study suggest that internet-based surveillance systems have broader applicability to monitoring infectious diseases than has previously been recognised. Furthermore, internet-based surveillance systems have a potential role in forecasting emerging infectious disease events, especially for vaccine-preventable and vector-borne diseases.

  1. A Systematic Assessment of Google Search Queries and Readability of Online Gynecologic Oncology Patient Education Materials.

    PubMed

    Martin, Alexandra; Stewart, J Ryan; Gaskins, Jeremy; Medlin, Erin

    2018-01-20

    The Internet is a major source of health information for gynecologic cancer patients. In this study, we systematically explore common Google search terms related to gynecologic cancer and calculate readability of top resulting websites. We used Google AdWords Keyword Planner to generate a list of commonly searched keywords related to gynecologic oncology, which were sorted into five groups (cervical cancer, ovarian cancer, uterine cancer, vulvar cancer, vaginal cancer) using five patient education websites from sgo.org . Each keyword was Google searched to create a list of top websites. The Python programming language (version 3.5.1) was used to describe frequencies of keywords, top-level domains (TLDs), domains, and readability of top websites using four validated formulae. Of the estimated 1,846,950 monthly searches resulting in 62,227 websites, the most common was cancer.org . The most common TLD was *.com. Most websites were above the eighth-grade reading level recommended by the American Medical Association (AMA) and the National Institute of Health (NIH). The SMOG Index was the most reliable formula. The mean grade level readability for all sites using SMOG was 9.4 ± 2.3, with 23.9% of sites falling at or below the eighth-grade reading level. The first ten results for each Google keyword were easiest to read with results beyond the first page of Google being consistently more difficult. Keywords related to gynecologic malignancies are Google-searched frequently. Most websites are difficult to read without a high school education. This knowledge may help gynecologic oncology providers adequately meet the needs of their patients.

  2. Assessing Ebola-related web search behaviour: insights and implications from an analytical study of Google Trends-based query volumes.

    PubMed

    Alicino, Cristiano; Bragazzi, Nicola Luigi; Faccio, Valeria; Amicizia, Daniela; Panatto, Donatella; Gasparini, Roberto; Icardi, Giancarlo; Orsi, Andrea

    2015-12-10

    The 2014 Ebola epidemic in West Africa has attracted public interest worldwide, leading to millions of Ebola-related Internet searches being performed during the period of the epidemic. This study aimed to evaluate and interpret Google search queries for terms related to the Ebola outbreak both at the global level and in all countries where primary cases of Ebola occurred. The study also endeavoured to look at the correlation between the number of overall and weekly web searches and the number of overall and weekly new cases of Ebola. Google Trends (GT) was used to explore Internet activity related to Ebola. The study period was from 29 December 2013 to 14 June 2015. Pearson's correlation was performed to correlate Ebola-related relative search volumes (RSVs) with the number of weekly and overall Ebola cases. Multivariate regression was performed using Ebola-related RSV as a dependent variable, and the overall number of Ebola cases and the Human Development Index were used as predictor variables. The greatest RSV was registered in the three West African countries mainly affected by the Ebola epidemic. The queries varied in the different countries. Both quantitative and qualitative differences between the affected African countries and other Western countries with primary cases were noted, in relation to the different flux volumes and different time courses. In the affected African countries, web query search volumes were mostly concentrated in the capital areas. However, in Western countries, web queries were uniformly distributed over the national territory. In terms of the three countries mainly affected by the Ebola epidemic, the correlation between the number of new weekly cases of Ebola and the weekly GT index varied from weak to moderate. The correlation between the number of Ebola cases registered in all countries during the study period and the GT index was very high. Google Trends showed a coarse-grained nature, strongly correlating with global

  3. Examining the themes of STD-related Internet searches to increase specificity of disease forecasting using Internet search terms.

    PubMed

    Johnson, Amy K; Mikati, Tarek; Mehta, Supriya D

    2016-11-09

    US surveillance of sexually transmitted diseases (STDs) is often delayed and incomplete which creates missed opportunities to identify and respond to trends in disease. Internet search engine data has the potential to be an efficient, economical and representative enhancement to the established surveillance system. Google Trends allows the download of de-identified search engine data, which has been used to demonstrate the positive and statistically significant association between STD-related search terms and STD rates. In this study, search engine user content was identified by surveying specific exposure groups of individuals (STD clinic patients and university students) aged 18-35. Participants were asked to list the terms they use to search for STD-related information. Google Correlate was used to validate search term content. On average STD clinic participant queries were longer compared to student queries. STD clinic participants were more likely to report using search terms that were related to symptomatology such as describing symptoms of STDs, while students were more likely to report searching for general information. These differences in search terms by subpopulation have implications for STD surveillance in populations at most risk for disease acquisition.

  4. Are cannabis prevalence estimates comparable across countries and regions? A cross-cultural validation using search engine query data.

    PubMed

    Steppan, Martin; Kraus, Ludwig; Piontek, Daniela; Siciliano, Valeria

    2013-01-01

    Prevalence estimation of cannabis use is usually based on self-report data. Although there is evidence on the reliability of this data source, its cross-cultural validity is still a major concern. External objective criteria are needed for this purpose. In this study, cannabis-related search engine query data are used as an external criterion. Data on cannabis use were taken from the 2007 European School Survey Project on Alcohol and Other Drugs (ESPAD). Provincial data came from three Italian nation-wide studies using the same methodology (2006-2008; ESPAD-Italia). Information on cannabis-related search engine query data was based on Google search volume indices (GSI). (1) Reliability analysis was conducted for GSI. (2) Latent measurement models of "true" cannabis prevalence were tested using perceived availability, web-based cannabis searches and self-reported prevalence as indicators. (3) Structure models were set up to test the influences of response tendencies and geographical position (latitude, longitude). In order to test the stability of the models, analyses were conducted on country level (Europe, US) and on provincial level in Italy. Cannabis-related GSI were found to be highly reliable and constant over time. The overall measurement model was highly significant in both data sets. On country level, no significant effects of response bias indicators and geographical position on perceived availability, web-based cannabis searches and self-reported prevalence were found. On provincial level, latitude had a significant positive effect on availability indicating that perceived availability of cannabis in northern Italy was higher than expected from the other indicators. Although GSI showed weaker associations with cannabis use than perceived availability, the findings underline the external validity and usefulness of search engine query data as external criteria. The findings suggest an acceptable relative comparability of national (provincial) prevalence

  5. On Relevance Weight Estimation and Query Expansion.

    ERIC Educational Resources Information Center

    Robertson, S. E.

    1986-01-01

    A Bayesian argument is used to suggest modifications to the Robertson and Jones relevance weighting formula to accommodate the addition to the query of terms taken from the relevant documents identified during the search. (Author)

  6. Quantum Private Queries

    NASA Astrophysics Data System (ADS)

    Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo

    2008-06-01

    We propose a cheat sensitive quantum protocol to perform a private search on a classical database which is efficient in terms of communication complexity. It allows a user to retrieve an item from the database provider without revealing which item he or she retrieved: if the provider tries to obtain information on the query, the person querying the database can find it out. The protocol ensures also perfect data privacy of the database: the information that the user can retrieve in a single query is bounded and does not depend on the size of the database. With respect to the known (quantum and classical) strategies for private information retrieval, our protocol displays an exponential reduction in communication complexity and in running-time computational complexity.

  7. Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms.

    PubMed

    Kim, Sun; Yeganova, Lana; Wilbur, W John

    2016-10-01

    Medical Subject Headings (MeSH(®)) is a controlled vocabulary for indexing and searching biomedical literature. MeSH terms and subheadings are organized in a hierarchical structure and are used to indicate the topics of an article. Biologists can use either MeSH terms as queries or the MeSH interface provided in PubMed(®) for searching PubMed abstracts. However, these are rarely used, and there is no convenient way to link standardized MeSH terms to user queries. Here, we introduce a web interface which allows users to enter queries to find MeSH terms closely related to the queries. Our method relies on co-occurrence of text words and MeSH terms to find keywords that are related to each MeSH term. A query is then matched with the keywords for MeSH terms, and candidate MeSH terms are ranked based on their relatedness to the query. The experimental results show that our method achieves the best performance among several term extraction approaches in terms of topic coherence. Moreover, the interface can be effectively used to find full names of abbreviations and to disambiguate user queries. https://www.ncbi.nlm.nih.gov/IRET/MESHABLE/ CONTACT: sun.kim@nih.gov Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  8. Using search query surveillance to monitor tax avoidance and smoking cessation following the United States' 2009 "SCHIP" cigarette tax increase.

    PubMed

    Ayers, John W; Ribisl, Kurt; Brownstein, John S

    2011-03-16

    Smokers can use the web to continue or quit their habit. Online vendors sell reduced or tax-free cigarettes lowering smoking costs, while health advocates use the web to promote cessation. We examined how smokers' tax avoidance and smoking cessation Internet search queries were motivated by the United States' (US) 2009 State Children's Health Insurance Program (SCHIP) federal cigarette excise tax increase and two other state specific tax increases. Google keyword searches among residents in a taxed geography (US or US state) were compared to an untaxed geography (Canada) for two years around each tax increase. Search data were normalized to a relative search volume (RSV) scale, where the highest search proportion was labeled 100 with lesser proportions scaled by how they relatively compared to the highest proportion. Changes in RSV were estimated by comparing means during and after the tax increase to means before the tax increase, across taxed and untaxed geographies. The SCHIP tax was associated with an 11.8% (95% confidence interval [95%CI], 5.7 to 17.9; p<.001) immediate increase in cessation searches; however, searches quickly abated and approximated differences from pre-tax levels in Canada during the months after the tax. Tax avoidance searches increased 27.9% (95%CI, 15.9 to 39.9; p<.001) and 5.3% (95%CI, 3.6 to 7.1; p<.001) during and in the months after the tax compared to Canada, respectively, suggesting avoidance is the more pronounced and durable response. Trends were similar for state-specific tax increases but suggest strong interactive processes across taxes. When the SCHIP tax followed Florida's tax, versus not, it promoted more cessation and avoidance searches. Efforts to combat tax avoidance and increase cessation may be enhanced by using interventions targeted and tailored to smokers' searches. Search query surveillance is a valuable real-time, free and public method, that may be generalized to other behavioral, biological, informational or

  9. Multimedia Web Searching Trends.

    ERIC Educational Resources Information Center

    Ozmutlu, Seda; Spink, Amanda; Ozmutlu, H. Cenk

    2002-01-01

    Examines and compares multimedia Web searching by Excite and FAST search engine users in 2001. Highlights include audio and video queries; time spent on searches; terms per query; ranking of the most frequently used terms; and differences in Web search behaviors of U.S. and European Web users. (Author/LRW)

  10. They’re heating up: Internet search query trends reveal significant public interest in heat-not-burn tobacco products

    PubMed Central

    Caputi, Theodore L.; Leas, Eric; Dredze, Mark; Cohen, Joanna E.; Ayers, John W.

    2017-01-01

    Heat-not-burn tobacco products, battery powered devices that heat leaf tobacco to approximately 500 degrees Fahrenheit to produce an inhalable aerosol, are being introduced in markets around the world. Japan, where manufacturers have marketed several heat-not-burn brands since 2014, has been the focal national test market, with the intention of developing global marketing strategies. We used Google search query data to estimate, for the first time, the scale and growth potential of heat-not-burn tobacco products. Average monthly searches for heat-not-burn products rose 1,426% (95%CI: 746,3574) between their first (2015) and second (2016) complete years on the market and an additional 100% (95%CI: 60, 173) between the products second (2016) and third years on the market (Jan-Sep 2017). There are now between 5.9 and 7.5 million heat-not-burn related Google searches in Japan each month based on September 2017 estimates. Moreover, forecasts relying on the historical trends suggest heat-not-burn searches will increase an additional 32% (95%CI: -4 to 79) during 2018, compared to current estimates for 2017 (Jan-Sep), with continued growth thereafter expected. Contrasting heat-not-burn’s rise in Japan to electronic cigarettes’ rise in the United States we find searches for heat-not-burn eclipsed electronic cigarette searches during April 2016. Moreover, the change in average monthly queries for heat-not-burn in Japan between 2015 and 2017 was 399 (95% CI: 184, 1490) times larger than the change in average monthly queries for electronic cigarettes in the Unites States over the same time period, increasing by 2,956% (95% CI: 1729, 7304) compared to only 7% (95% CI: 3,13). Our findings are a clarion call for tobacco control leaders to ready themselves as heat-not-burn tobacco products will likely garner substantial interest as they are introduced into new markets. Public health practitioners should expand heat-not-burn tobacco product surveillance, adjust existing tobacco

  11. They're heating up: Internet search query trends reveal significant public interest in heat-not-burn tobacco products.

    PubMed

    Caputi, Theodore L; Leas, Eric; Dredze, Mark; Cohen, Joanna E; Ayers, John W

    2017-01-01

    Heat-not-burn tobacco products, battery powered devices that heat leaf tobacco to approximately 500 degrees Fahrenheit to produce an inhalable aerosol, are being introduced in markets around the world. Japan, where manufacturers have marketed several heat-not-burn brands since 2014, has been the focal national test market, with the intention of developing global marketing strategies. We used Google search query data to estimate, for the first time, the scale and growth potential of heat-not-burn tobacco products. Average monthly searches for heat-not-burn products rose 1,426% (95%CI: 746,3574) between their first (2015) and second (2016) complete years on the market and an additional 100% (95%CI: 60, 173) between the products second (2016) and third years on the market (Jan-Sep 2017). There are now between 5.9 and 7.5 million heat-not-burn related Google searches in Japan each month based on September 2017 estimates. Moreover, forecasts relying on the historical trends suggest heat-not-burn searches will increase an additional 32% (95%CI: -4 to 79) during 2018, compared to current estimates for 2017 (Jan-Sep), with continued growth thereafter expected. Contrasting heat-not-burn's rise in Japan to electronic cigarettes' rise in the United States we find searches for heat-not-burn eclipsed electronic cigarette searches during April 2016. Moreover, the change in average monthly queries for heat-not-burn in Japan between 2015 and 2017 was 399 (95% CI: 184, 1490) times larger than the change in average monthly queries for electronic cigarettes in the Unites States over the same time period, increasing by 2,956% (95% CI: 1729, 7304) compared to only 7% (95% CI: 3,13). Our findings are a clarion call for tobacco control leaders to ready themselves as heat-not-burn tobacco products will likely garner substantial interest as they are introduced into new markets. Public health practitioners should expand heat-not-burn tobacco product surveillance, adjust existing tobacco

  12. SymDex: increasing the efficiency of chemical fingerprint similarity searches for comparing large chemical libraries by using query set indexing.

    PubMed

    Tai, David; Fang, Jianwen

    2012-08-27

    The large sizes of today's chemical databases require efficient algorithms to perform similarity searches. It can be very time consuming to compare two large chemical databases. This paper seeks to build upon existing research efforts by describing a novel strategy for accelerating existing search algorithms for comparing large chemical collections. The quest for efficiency has focused on developing better indexing algorithms by creating heuristics for searching individual chemical against a chemical library by detecting and eliminating needless similarity calculations. For comparing two chemical collections, these algorithms simply execute searches for each chemical in the query set sequentially. The strategy presented in this paper achieves a speedup upon these algorithms by indexing the set of all query chemicals so redundant calculations that arise in the case of sequential searches are eliminated. We implement this novel algorithm by developing a similarity search program called Symmetric inDexing or SymDex. SymDex shows over a 232% maximum speedup compared to the state-of-the-art single query search algorithm over real data for various fingerprint lengths. Considerable speedup is even seen for batch searches where query set sizes are relatively small compared to typical database sizes. To the best of our knowledge, SymDex is the first search algorithm designed specifically for comparing chemical libraries. It can be adapted to most, if not all, existing indexing algorithms and shows potential for accelerating future similarity search algorithms for comparing chemical databases.

  13. Boolean logic tree of graphene-based chemical system for molecular computation and intelligent molecular search query.

    PubMed

    Huang, Wei Tao; Luo, Hong Qun; Li, Nian Bing

    2014-05-06

    The most serious, and yet unsolved, problem of constructing molecular computing devices consists in connecting all of these molecular events into a usable device. This report demonstrates the use of Boolean logic tree for analyzing the chemical event network based on graphene, organic dye, thrombin aptamer, and Fenton reaction, organizing and connecting these basic chemical events. And this chemical event network can be utilized to implement fluorescent combinatorial logic (including basic logic gates and complex integrated logic circuits) and fuzzy logic computing. On the basis of the Boolean logic tree analysis and logic computing, these basic chemical events can be considered as programmable "words" and chemical interactions as "syntax" logic rules to construct molecular search engine for performing intelligent molecular search query. Our approach is helpful in developing the advanced logic program based on molecules for application in biosensing, nanotechnology, and drug delivery.

  14. Tracking search engine queries for suicide in the United Kingdom, 2004-2013.

    PubMed

    Arora, V S; Stuckler, D; McKee, M

    2016-08-01

    First, to determine if a cyclical trend is observed for search activity of suicide and three common suicide risk factors in the United Kingdom: depression, unemployment, and marital strain. Second, to test the validity of suicide search data as a potential marker of suicide risk by evaluating whether web searches for suicide associate with suicide rates among those of different ages and genders in the United Kingdom. Cross-sectional. Search engine data was obtained from Google Trends, a publicly available repository of information of trends and patterns of user searches on Google. The following phrases were entered into Google Trends to analyse relative search volume for suicide, depression, job loss, and divorce, respectively: 'suicide'; 'depression + depressed + hopeless'; 'unemployed + lost job'; 'divorce'. Spearman's rank correlation coefficient was employed to test bivariate associations between suicide search activity and official suicide rates from the Office of National Statistics (ONS). Cyclical trends were observed in search activity for suicide and depression-related search activity, with peaks in autumn and winter months, and a trough in summer months. A positive, non-significant association was found between suicide-related search activity and suicide rates in the general working-age population (15-64 years) (ρ = 0.164; P = 0.652). This association is stronger in younger age groups, particularly for those 25-34 years of age (ρ = 0.848; P = 0.002). We give credence to a link between search activity for suicide and suicide rates in the United Kingdom from 2004 to 2013 for high risk sub-populations (i.e. male youth and young professionals). There remains a need for further research on how Google Trends can be used in other areas of disease surveillance and for work to provide greater geographical precision, as well as research on ways of mitigating the risk of internet use leading to suicide ideation in youth. Copyright © 2015 The Royal

  15. An Alternative to QUERY: Batch-Searching of the ERIC Information Collections.

    ERIC Educational Resources Information Center

    Krahmer, Edward; Horne, Kent

    A manual describing the RIC computer search program for retrieval of information from ERIC, CIJE, and other collections is presented. It is pointed out that two versions of this program have been developed. The first is for an IBM 360/370 computer. This version has been operational on a production basis for nearly a year. Four installations of…

  16. Automatic Query Formulations in Information Retrieval.

    ERIC Educational Resources Information Center

    Salton, G.; And Others

    1983-01-01

    Introduces methods designed to reduce role of search intermediaries by generating Boolean search formulations automatically using term frequency considerations from natural language statements provided by system patrons. Experimental results are supplied and methods are described for applying automatic query formulation process in practice.…

  17. Query Auto-Completion Based on Word2vec Semantic Similarity

    NASA Astrophysics Data System (ADS)

    Shao, Taihua; Chen, Honghui; Chen, Wanyu

    2018-04-01

    Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.

  18. Semantic Features for Classifying Referring Search Terms

    SciTech Connect

    May, Chandler J.; Henry, Michael J.; McGrath, Liam R.

    2012-05-11

    When an internet user clicks on a result in a search engine, a request is submitted to the destination web server that includes a referrer field containing the search terms given by the user. Using this information, website owners can analyze the search terms leading to their websites to better understand their visitors needs. This work explores some of the features that can be used for classification-based analysis of such referring search terms. We present initial results for the example task of classifying HTTP requests countries of origin. A system that can accurately predict the country of origin from querymore » text may be a valuable complement to IP lookup methods which are susceptible to the obfuscation of dereferrers or proxies. We suggest that the addition of semantic features improves classifier performance in this example application. We begin by looking at related work and presenting our approach. After describing initial experiments and results, we discuss paths forward for this work.« less

  19. Improving 3d Spatial Queries Search: Newfangled Technique of Space Filling Curves in 3d City Modeling

    NASA Astrophysics Data System (ADS)

    Uznir, U.; Anton, F.; Suhaibah, A.; Rahman, A. A.; Mioc, D.

    2013-09-01

    The advantages of three dimensional (3D) city models can be seen in various applications including photogrammetry, urban and regional planning, computer games, etc.. They expand the visualization and analysis capabilities of Geographic Information Systems on cities, and they can be developed using web standards. However, these 3D city models consume much more storage compared to two dimensional (2D) spatial data. They involve extra geometrical and topological information together with semantic data. Without a proper spatial data clustering method and its corresponding spatial data access method, retrieving portions of and especially searching these 3D city models, will not be done optimally. Even though current developments are based on an open data model allotted by the Open Geospatial Consortium (OGC) called CityGML, its XML-based structure makes it challenging to cluster the 3D urban objects. In this research, we propose an opponent data constellation technique of space-filling curves (3D Hilbert curves) for 3D city model data representation. Unlike previous methods, that try to project 3D or n-dimensional data down to 2D or 3D using Principal Component Analysis (PCA) or Hilbert mappings, in this research, we extend the Hilbert space-filling curve to one higher dimension for 3D city model data implementations. The query performance was tested using a CityGML dataset of 1,000 building blocks and the results are presented in this paper. The advantages of implementing space-filling curves in 3D city modeling will improve data retrieval time by means of optimized 3D adjacency, nearest neighbor information and 3D indexing. The Hilbert mapping, which maps a subinterval of the [0, 1] interval to the corresponding portion of the d-dimensional Hilbert's curve, preserves the Lebesgue measure and is Lipschitz continuous. Depending on the applications, several alternatives are possible in order to cluster spatial data together in the third dimension compared to its

  20. Querying Proofs

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2012-01-01

    We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.

  1. Querying and Ranking XML Documents.

    ERIC Educational Resources Information Center

    Schlieder, Torsten; Meuss, Holger

    2002-01-01

    Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…

  2. Automatic query formulations in information retrieval.

    PubMed

    Salton, G; Buckley, C; Fox, E A

    1983-07-01

    Modern information retrieval systems are designed to supply relevant information in response to requests received from the user population. In most retrieval environments the search requests consist of keywords, or index terms, interrelated by appropriate Boolean operators. Since it is difficult for untrained users to generate effective Boolean search requests, trained search intermediaries are normally used to translate original statements of user need into useful Boolean search formulations. Methods are introduced in this study which reduce the role of the search intermediaries by making it possible to generate Boolean search formulations completely automatically from natural language statements provided by the system patrons. Frequency considerations are used automatically to generate appropriate term combinations as well as Boolean connectives relating the terms. Methods are covered to produce automatic query formulations both in a standard Boolean logic system, as well as in an extended Boolean system in which the strict interpretation of the connectives is relaxed. Experimental results are supplied to evaluate the effectiveness of the automatic query formulation process, and methods are described for applying the automatic query formulation process in practice.

  3. Evidential significance of automotive paint trace evidence using a pattern recognition based infrared library search engine for the Paint Data Query Forensic Database.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Fasasi, Ayuba; Weakley, Andrew

    2016-10-01

    A prototype library search engine has been further developed to search the infrared spectral libraries of the paint data query database to identify the line and model of a vehicle from the clear coat, surfacer-primer, and e-coat layers of an intact paint chip. For this study, search prefilters were developed from 1181 automotive paint systems spanning 3 manufacturers: General Motors, Chrysler, and Ford. The best match between each unknown and the spectra in the hit list generated by the search prefilters was identified using a cross-correlation library search algorithm that performed both a forward and backward search. In the forward search, spectra were divided into intervals and further subdivided into windows (which corresponds to the time lag for the comparison) within those intervals. The top five hits identified in each search window were compiled; a histogram was computed that summarized the frequency of occurrence for each library sample, with the IR spectra most similar to the unknown flagged. The backward search computed the frequency and occurrence of each line and model without regard to the identity of the individual spectra. Only those lines and models with a frequency of occurrence greater than or equal to 20% were included in the final hit list. If there was agreement between the forward and backward search results, the specific line and model common to both hit lists was always the correct assignment. Samples assigned to the same line and model by both searches are always well represented in the library and correlate well on an individual basis to specific library samples. For these samples, one can have confidence in the accuracy of the match. This was not the case for the results obtained using commercial library search algorithms, as the hit quality index scores for the top twenty hits were always greater than 99%. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Using Search Query Surveillance to Monitor Tax Avoidance and Smoking Cessation following the United States' 2009 “SCHIP” Cigarette Tax Increase

    PubMed Central

    Ayers, John W.; Ribisl, Kurt; Brownstein, John S.

    2011-01-01

    Smokers can use the web to continue or quit their habit. Online vendors sell reduced or tax-free cigarettes lowering smoking costs, while health advocates use the web to promote cessation. We examined how smokers' tax avoidance and smoking cessation Internet search queries were motivated by the United States' (US) 2009 State Children's Health Insurance Program (SCHIP) federal cigarette excise tax increase and two other state specific tax increases. Google keyword searches among residents in a taxed geography (US or US state) were compared to an untaxed geography (Canada) for two years around each tax increase. Search data were normalized to a relative search volume (RSV) scale, where the highest search proportion was labeled 100 with lesser proportions scaled by how they relatively compared to the highest proportion. Changes in RSV were estimated by comparing means during and after the tax increase to means before the tax increase, across taxed and untaxed geographies. The SCHIP tax was associated with an 11.8% (95% confidence interval [95%CI], 5.7 to 17.9; p<.001) immediate increase in cessation searches; however, searches quickly abated and approximated differences from pre-tax levels in Canada during the months after the tax. Tax avoidance searches increased 27.9% (95%CI, 15.9 to 39.9; p<.001) and 5.3% (95%CI, 3.6 to 7.1; p<.001) during and in the months after the tax compared to Canada, respectively, suggesting avoidance is the more pronounced and durable response. Trends were similar for state-specific tax increases but suggest strong interactive processes across taxes. When the SCHIP tax followed Florida's tax, versus not, it promoted more cessation and avoidance searches. Efforts to combat tax avoidance and increase cessation may be enhanced by using interventions targeted and tailored to smokers' searches. Search query surveillance is a valuable real-time, free and public method, that may be generalized to other behavioral, biological, informational or

  5. Searching PubMed for studies on bacteremia, bloodstream infection, septicemia, or whatever the best term is: a note of caution.

    PubMed

    Søgaard, Mette; Andersen, Jens P; Schønheyder, Henrik C

    2012-04-01

    There is inconsistency in the terminology used to describe bacteremia. To demonstrate the impact on information retrieval, we compared the yield of articles from PubMed MEDLINE using the terms "bacteremia," "bloodstream infection," and "septicemia." We searched for articles published between 1966 and 2009, and depicted the relationships among queries graphically. To examine the content of the retrieved articles, we extracted all Medical Subject Headings (MeSH) terms and compared topic similarity using a cosine measure. The recovered articles differed greatly by term, and only 53 articles were captured by all terms. Of the articles retrieved by the "bacteremia" query, 21,438 (84.1%) were not captured when searching for "bloodstream infection" or "septicemia." Likewise, only 2,243 of the 11,796 articles recovered by free-text query for "bloodstream infection" were retrieved by the "bacteremia" query (19%). Entering "bloodstream infection" as a phrase, 46.1% of the records overlapped with the "bacteremia" query. Similarity measures ranged from 0.52 to 0.78 and were lowest for "bloodstream infection" as a phrase compared with "septicemia." Inconsistent terminology has a major impact on the yield of queries. Agreement on terminology should be sought and promoted by scientific journals. An immediate solution is to add "bloodstream infection" as entry term for bacteremia in the MeSH vocabulary. Copyright © 2012 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Mosby, Inc. All rights reserved.

  6. An Index to All "Query" Computer Searches Completed from July 1973 to June 1974. Search Number 0403-0619. Information Series No. 24.

    ERIC Educational Resources Information Center

    Wilder, Dolores J., Comp.; Hines, Rella, Comp.

    The Tennessee Research Coordinating Unit (RCU) has implemented a computerized information retrieval system known as "Query," which allows for the retrieval of documents indexed in Research in Education (RIE), Current Index to Journals in Education (CIJE), and Abstracts of Instructional and Research Materials (AIM/ARM). The document…

  7. Do Seasons Have an Influence on the Incidence of Depression? The Use of an Internet Search Engine Query Data as a Proxy of Human Affect

    PubMed Central

    Yang, Albert C.; Huang, Norden E.; Peng, Chung-Kang; Tsai, Shih-Jen

    2010-01-01

    Background Seasonal depression has generated considerable clinical interest in recent years. Despite a common belief that people in higher latitudes are more vulnerable to low mood during the winter, it has never been demonstrated that human's moods are subject to seasonal change on a global scale. The aim of this study was to investigate large-scale seasonal patterns of depression using Internet search query data as a signature and proxy of human affect. Methodology/Principal Findings Our study was based on a publicly available search engine database, Google Insights for Search, which provides time series data of weekly search trends from January 1, 2004 to June 30, 2009. We applied an empirical mode decomposition method to isolate seasonal components of health-related search trends of depression in 54 geographic areas worldwide. We identified a seasonal trend of depression that was opposite between the northern and southern hemispheres; this trend was significantly correlated with seasonal oscillations of temperature (USA: r = −0.872, p<0.001; Australia: r = −0.656, p<0.001). Based on analyses of search trends over 54 geological locations worldwide, we found that the degree of correlation between searching for depression and temperature was latitude-dependent (northern hemisphere: r = −0.686; p<0.001; southern hemisphere: r = 0.871; p<0.0001). Conclusions/Significance Our findings indicate that Internet searches for depression from people in higher latitudes are more vulnerable to seasonal change, whereas this phenomenon is obscured in tropical areas. This phenomenon exists universally across countries, regardless of language. This study provides novel, Internet-based evidence for the epidemiology of seasonal depression. PMID:21060851

  8. A model for the determination of pollen count using google search queries for patients suffering from allergic rhinitis.

    PubMed

    König, Volker; Mösges, Ralph

    2014-01-01

    Background. The transregional increase in pollen-associated allergies and their diversity have been scientifically proven. However, patchy pollen count measurement in many regions is a worldwide problem with few exceptions. Methods. This paper used data gathered from pollen count stations in Germany, Google queries using relevant allergological/biological keywords, and patient data from three German study centres collected in a prospective, double-blind, randomised, placebo-controlled, multicentre immunotherapy study to analyse a possible correlation between these data pools. Results. Overall, correlations between the patient-based, combined symptom medication score and Google data were stronger than those with the regionally measured pollen count data. The correlation of the Google data was especially strong in the groups of severe allergy sufferers. The results of the three-centre analyses show moderate to strong correlations with the Google keywords (up to >0.8 cross-correlation coefficient, P < 0.001) in 10 out of 11 groups (three averaged patient cohorts and eight subgroups of severe allergy sufferers: high IgE class, high combined symptom medication score, and asthma). Conclusion. For countries with a good Internet infrastructure but no dense network of pollen traps, this could represent an alternative for determining pollen levels and, forecasting the pollen count for the next day.

  9. Pattern Recognition-Assisted Infrared Library Searching of the Paint Data Query Database to Enhance Lead Information from Automotive Paint Trace Evidence.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Weakley, Andrew

    2017-03-01

    Multilayered automotive paint fragments, which are one of the most complex materials encountered in the forensic science laboratory, provide crucial links in criminal investigations and prosecutions. To determine the origin of these paint fragments, forensic automotive paint examiners have turned to the paint data query (PDQ) database, which allows the forensic examiner to compare the layer sequence and color, texture, and composition of the sample to paint systems of the original equipment manufacturer (OEM). However, modern automotive paints have a thin color coat and this layer on a microscopic fragment is often too thin to obtain accurate chemical and topcoat color information. A search engine has been developed for the infrared (IR) spectral libraries of the PDQ database in an effort to improve discrimination capability and permit quantification of discrimination power for OEM automotive paint comparisons. The similarity of IR spectra of the corresponding layers of various records for original finishes in the PDQ database often results in poor discrimination using commercial library search algorithms. A pattern recognition approach employing pre-filters and a cross-correlation library search algorithm that performs both a forward and backward search has been used to significantly improve the discrimination of IR spectra in the PDQ database and thus improve the accuracy of the search. This improvement permits inter-comparison of OEM automotive paint layer systems using the IR spectra alone. Such information can serve to quantify the discrimination power of the original automotive paint encountered in casework and further efforts to succinctly communicate trace evidence to the courts.

  10. Multitasking Web Searching and Implications for Design.

    ERIC Educational Resources Information Center

    Ozmutlu, Seda; Ozmutlu, H. C.; Spink, Amanda

    2003-01-01

    Findings from a study of users' multitasking searches on Web search engines include: multitasking searches are a noticeable user behavior; multitasking search sessions are longer than regular search sessions in terms of queries per session and duration; both Excite and AlltheWeb.com users search for about three topics per multitasking session and…

  11. Does query expansion limit our learning? A comparison of social-based expansion to content-based expansion for medical queries on the internet.

    PubMed

    Pentoney, Christopher; Harwell, Jeff; Leroy, Gondy

    2014-01-01

    Searching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google's query suggestion tool, a type of query expansion, aims to facilitate query formation. However, it is unknown how this expansion, which is based on what others searched for, affects the information gathering of the online community. To measure the impact of social-based query expansion, this study compared it with content-based expansion, i.e., what is really in the text. We used 138,906 medical queries from the AOL User Session Collection and expanded them using Google's Autocomplete method (social-based) and the content of the Google Web Corpus (content-based). We evaluated the specificity and ambiguity of the expansion terms for trigram queries. We also looked at the impact on the actual results using domain diversity and expansion edit distance. Results showed that the social-based method provided more precise expansion terms as well as terms that were less ambiguous. Expanded queries do not differ significantly in diversity when expanded using the social-based method (6.72 different domains returned in the first ten results, on average) vs. content-based method (6.73 different domains, on average).

  12. The Profile-Query Relationship.

    ERIC Educational Resources Information Center

    Shepherd, Michael A.; Phillips, W. J.

    1986-01-01

    Defines relationship between user profile and user query in terms of relationship between clusters of documents retrieved by each, and explores the expression of cluster similarity and cluster overlap as linear functions of similarity existing between original pairs of profiles and queries, given the desired retrieval threshold. (23 references)…

  13. Short-term Internet search using makes people rely on search engines when facing unknown issues.

    PubMed

    Wang, Yifan; Wu, Lingdan; Luo, Liang; Zhang, Yifen; Dong, Guangheng

    2017-01-01

    The Internet search engines, which have powerful search/sort functions and ease of use features, have become an indispensable tool for many individuals. The current study is to test whether the short-term Internet search training can make people more dependent on it. Thirty-one subjects out of forty subjects completed the search training study which included a pre-test, a six-day's training of Internet search, and a post-test. During the pre- and post- tests, subjects were asked to search online the answers to 40 unusual questions, remember the answers and recall them in the scanner. Un-learned questions were randomly presented at the recalling stage in order to elicited search impulse. Comparing to the pre-test, subjects in the post-test reported higher impulse to use search engines to answer un-learned questions. Consistently, subjects showed higher brain activations in dorsolateral prefrontal cortex and anterior cingulate cortex in the post-test than in the pre-test. In addition, there were significant positive correlations self-reported search impulse and brain responses in the frontal areas. The results suggest that a simple six-day's Internet search training can make people dependent on the search tools when facing unknown issues. People are easily dependent on the Internet search engines.

  14. Short-term Internet search using makes people rely on search engines when facing unknown issues

    PubMed Central

    Wang, Yifan; Wu, Lingdan; Luo, Liang; Zhang, Yifen

    2017-01-01

    The Internet search engines, which have powerful search/sort functions and ease of use features, have become an indispensable tool for many individuals. The current study is to test whether the short-term Internet search training can make people more dependent on it. Thirty-one subjects out of forty subjects completed the search training study which included a pre-test, a six-day’s training of Internet search, and a post-test. During the pre- and post- tests, subjects were asked to search online the answers to 40 unusual questions, remember the answers and recall them in the scanner. Un-learned questions were randomly presented at the recalling stage in order to elicited search impulse. Comparing to the pre-test, subjects in the post-test reported higher impulse to use search engines to answer un-learned questions. Consistently, subjects showed higher brain activations in dorsolateral prefrontal cortex and anterior cingulate cortex in the post-test than in the pre-test. In addition, there were significant positive correlations self-reported search impulse and brain responses in the frontal areas. The results suggest that a simple six-day’s Internet search training can make people dependent on the search tools when facing unknown issues. People are easily dependent on the Internet search engines. PMID:28441408

  15. Enabling Incremental Query Re-Optimization

    PubMed Central

    Liu, Mengmeng; Ives, Zachary G.; Loo, Boon Thau

    2017-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs, and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations. PMID:28659658

  16. Enabling Incremental Query Re-Optimization.

    PubMed

    Liu, Mengmeng; Ives, Zachary G; Loo, Boon Thau

    2016-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs , and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries ; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations.

  17. Short-term perceptual learning in visual conjunction search.

    PubMed

    Su, Yuling; Lai, Yunpeng; Huang, Wanyi; Tan, Wei; Qu, Zhe; Ding, Yulong

    2014-08-01

    Although some studies showed that training can improve the ability of cross-dimension conjunction search, less is known about the underlying mechanism. Specifically, it remains unclear whether training of visual conjunction search can successfully bind different features of separated dimensions into a new function unit at early stages of visual processing. In the present study, we utilized stimulus specificity and generalization to provide a new approach to investigate the mechanisms underlying perceptual learning (PL) in visual conjunction search. Five experiments consistently showed that after 40 to 50 min of training of color-shape/orientation conjunction search, the ability to search for a certain conjunction target improved significantly and the learning effects did not transfer to a new target that differed from the trained target in both color and shape/orientation features. However, the learning effects were not strictly specific. In color-shape conjunction search, although the learning effect could not transfer to a same-shape different-color target, it almost completely transferred to a same-color different-shape target. In color-orientation conjunction search, the learning effect partly transferred to a new target that shared same color or same orientation with the trained target. Moreover, the sum of transfer effects for the same color target and the same orientation target in color-orientation conjunction search was algebraically equivalent to the learning effect for trained target, showing an additive transfer effect. The different transfer patterns in color-shape and color-orientation conjunction search learning might reflect the different complexity and discriminability between feature dimensions. These results suggested a feature-based attention enhancement mechanism rather than a unitization mechanism underlying the short-term PL of color-shape/orientation conjunction search.

  18. Evolution of Query Optimization Methods

    NASA Astrophysics Data System (ADS)

    Hameurlain, Abdelkader; Morvan, Franck

    Query optimization is the most critical phase in query processing. In this paper, we try to describe synthetically the evolution of query optimization methods from uniprocessor relational database systems to data Grid systems through parallel, distributed and data integration systems. We point out a set of parameters to characterize and compare query optimization methods, mainly: (i) size of the search space, (ii) type of method (static or dynamic), (iii) modification types of execution plans (re-optimization or re-scheduling), (iv) level of modification (intra-operator and/or inter-operator), (v) type of event (estimation errors, delay, user preferences), and (vi) nature of decision-making (centralized or decentralized control).

  19. Improving accuracy for identifying related PubMed queries by an integrated approach.

    PubMed

    Lu, Zhiyong; Wilbur, W John

    2009-10-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.

  20. Improving accuracy for identifying related PubMed queries by an integrated approach

    PubMed Central

    Lu, Zhiyong; Wilbur, W. John

    2009-01-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users’ search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1,539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1,396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments. PMID:19162232

  1. Relativistic quantum private database queries

    NASA Astrophysics Data System (ADS)

    Sun, Si-Jia; Yang, Yu-Guang; Zhang, Ming-Ou

    2015-04-01

    Recently, Jakobi et al. (Phys Rev A 83, 022301, 2011) suggested the first practical private database query protocol (J-protocol) based on the Scarani et al. (Phys Rev Lett 92, 057901, 2004) quantum key distribution protocol. Unfortunately, the J-protocol is just a cheat-sensitive private database query protocol. In this paper, we present an idealized relativistic quantum private database query protocol based on Minkowski causality and the properties of quantum information. Also, we prove that the protocol is secure in terms of the user security and the database security.

  2. Thesaurus-Enhanced Search Interfaces.

    ERIC Educational Resources Information Center

    Shiri, Ali Asghar; Revie, Crawford; Chowdhury, Gobinda

    2002-01-01

    Discussion of user interfaces to information retrieval systems focuses on interfaces that incorporate thesauri as part of their searching and browsing facilities. Discusses research literature related to information searching behavior, information retrieval interface evaluation, search term selection, and query expansion; and compares thesaurus…

  3. Hybrid Filtering in Semantic Query Processing

    ERIC Educational Resources Information Center

    Jeong, Hanjo

    2011-01-01

    This dissertation presents a hybrid filtering method and a case-based reasoning framework for enhancing the effectiveness of Web search. Web search may not reflect user needs, intent, context, and preferences, because today's keyword-based search is lacking semantic information to capture the user's context and intent in posing the search query.…

  4. System, method and apparatus for conducting a phrase search

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W. (Inventor)

    2004-01-01

    A phrase search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more sequences of terms. Next, a relational model of the query is created. The relational model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.

  5. A New Publicly Available Chemical Query Language, CSRML ...

    EPA Pesticide Factsheets

    A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transformation (e.g., SMIRKS, reaction SMILES) queries currently in use. Chemotypes, a term used to represent advanced CSRML queries for repeated application can be encoded not only with connectivity and topology, but also with properties of atoms, bonds, electronic systems, or molecules. The CSRML language has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory and commercial use chemical space, as well as to represent features and frameworks believed to be especially relevant to toxicity concerns. A software application, ChemoTyper, has also been developed and made publicly available to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML standard used in CSRML-based chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge. Paper details specifications for a new XML-based query lan

  6. Fuzzy queries above relational database

    NASA Astrophysics Data System (ADS)

    Smolka, Pavel; Bradac, Vladimir

    2017-11-01

    The aim of the theme is to introduce a possibility of fuzzy queries implemented in relational databases. The issue is described on a model which identifies the appropriate part of the problem domain for fuzzy approach. The model is demonstrated on a database of wines focused on searching in it. The construction of the database complies with the Law of the Czech Republic.

  7. Improving biomedical information retrieval by linear combinations of different query expansion techniques.

    PubMed

    Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar

    2016-07-25

    Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.

  8. Incremental Query Rewriting with Resolution

    NASA Astrophysics Data System (ADS)

    Riazanov, Alexandre; Aragão, Marcelo A. T.

    We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.

  9. Parallel Index and Query for Large Scale Data Analysis

    SciTech Connect

    Chou, Jerry; Wu, Kesheng; Ruebel, Oliver

    2011-07-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less

  10. GenoQuery: a new querying module for functional annotation in a genomic warehouse

    PubMed Central

    Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

    2008-01-01

    Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731

  11. FTree query construction for virtual screening: a statistical analysis.

    PubMed

    Gerlach, Christof; Broughton, Howard; Zaliani, Andrea

    2008-02-01

    FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.

  12. FTree query construction for virtual screening: a statistical analysis

    NASA Astrophysics Data System (ADS)

    Gerlach, Christof; Broughton, Howard; Zaliani, Andrea

    2008-02-01

    FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.

  13. Comment on "An Evaluation of Query Expansion by the Addition of Clustered Terms for a Document Retrieval System"

    ERIC Educational Resources Information Center

    Salton, G.

    1972-01-01

    The author emphasized that one cannot conclude from the experiments reported upon that term clusters (or equivalently, keyword classifications or thesauruses) are not useful in retrieval. (2 references) (Author)

  14. A Study of Term Proximity and Document Weighting Normalization in Pseudo Relevance Feedback - UIUC at TREC 2009 Million Query Track

    DTIC Science & Technology

    2009-11-01

    is estimated using the Gaussian kernel function: c′(w, i) = N∑ j =1 c(w, j ) exp [−(i− j )2 2σ2 ] (2) where i and j are absolute positions of the...corresponding terms in the document, and N is the length of the document; c(w, j ) is the actual count of term w at position j . The PLM P (·|D, i) needs to...probability of rel- evance well. The distribution of relevance can be approximated as fol- lows: p(i|θrel) = ∑ j δ(Qj , i)∑ i ∑ j δ(Qj , i) (10

  15. Spatial Query for Planetary Data

    NASA Technical Reports Server (NTRS)

    Shams, Khawaja S.; Crockett, Thomas M.; Powell, Mark W.; Joswig, Joseph C.; Fox, Jason M.

    2011-01-01

    Science investigators need to quickly and effectively assess past observations of specific locations on a planetary surface. This innovation involves a location-based search technology that was adapted and applied to planetary science data to support a spatial query capability for mission operations software. High-performance location-based searching requires the use of spatial data structures for database organization. Spatial data structures are designed to organize datasets based on their coordinates in a way that is optimized for location-based retrieval. The particular spatial data structure that was adapted for planetary data search is the R+ tree.

  16. Conceptual mapping of user's queries to medical subject headings.

    PubMed Central

    Zieman, Y. L.; Bleich, H. L.

    1997-01-01

    This paper describes a way to map users' queries to relevant Medical Subject Headings (MeSH terms) used by the National Library of Medicine to index the biomedical literature. The method, called SENSE (SEarch with New SEmantics), transforms words and phrases in the users' queries into primary conceptual components and compares these components with those of the MeSH vocabulary. Similar to the way in which most numbers can be split into numerical factors and expressed as their product--for example, 42 can be expressed as 2*21, 6*7, 3*14, 2*3*7,--so most medical concepts can be split into "semantic factors" and expressed as their juxtaposition. Note that if we split 42 into its primary factors, the breakdown is unique: 2*3*7. Similarly, when we split medical concepts into their "primary semantic factors" the breakdown is also unique. For example, the MeSH term 'renovascular hypertension' can be split morphologically into reno, vascular, hyper, and tension--morphemes that can then be translated into their primary semantic factors--kidney, blood vessel, high, and pressure. By "factoring" each MeSH term in this way, and by similarly factoring the user's query, we can match query to MeSH term by searching for combinations of common factors. Unlike UMLS and other methods that match at the level of words or phrases, SENSE matches at the level of concepts; in this way, a wide variety of words and phrases that have the same meaning produce the same match. Now used in PaperChase, the method is surprisingly powerful in matching users' queries to Medical Subject Headings. PMID:9357680

  17. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-12

    Lexington Massachusetts This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation...independent of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions

  18. LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.

    PubMed

    Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias

    2018-03-01

    In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions

  19. An Examination of Natural Language as a Query Formation Tool for Retrieving Information on E-Health from Pub Med.

    ERIC Educational Resources Information Center

    Peterson, Gabriel M.; Su, Kuichun; Ries, James E.; Sievert, Mary Ellen C.

    2002-01-01

    Discussion of Internet use for information searches on health-related topics focuses on a study that examined complexity and variability of natural language in using search terms that express the concept of electronic health (e-health). Highlights include precision of retrieved information; shift in terminology; and queries using the Pub Med…

  20. Information Retrieval Using UMLS-based Structured Queries

    PubMed Central

    Fagan, Lawrence M.; Berrios, Daniel C.; Chan, Albert; Cucina, Russell; Datta, Anupam; Shah, Maulik; Surendran, Sujith

    2001-01-01

    During the last three years, we have developed and described components of ELBook, a semantically based information-retrieval system [1-4]. Using these components, domain experts can specify a query model, indexers can use the query model to index documents, and end-users can search these documents for instances of indexed queries.

  1. Optimizing a Query by Transformation and Expansion.

    PubMed

    Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank

    2017-01-01

    In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.

  2. Term Relevance Feedback and Mediated Database Searching: Implications for Information Retrieval Practice and Systems Design.

    ERIC Educational Resources Information Center

    Spink, Amanda

    1995-01-01

    This study uses the human approach to examine the sources and effectiveness of search terms selected during 40 mediated interactive database searches and focuses on determining the retrieval effectiveness of search terms identified by users and intermediaries from retrieved items during term relevance feedback. (Author/JKP)

  3. An Ensemble Approach for Expanding Queries

    DTIC Science & Technology

    2012-11-01

    0.39 pain^0.39 Hospital 15094 0.82 hospital^0.82 Miscarriage 45 3.35 miscarriage ^3.35 Radiotherapy 53 3.28 radiotherapy^3.28 Hypoaldosteronism 3...negated query is the expansion of the original query with negation terms preceding each word. For example, the negated version of “ miscarriage ^3.35...includes “no miscarriage ”^3.35 and “not miscarriage ”^3.35. If a document is the result of both original query and negated query, its score is

  4. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-01

    unlimited. This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation, making...of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions) embedded in

  5. An index-based algorithm for fast on-line query processing of latent semantic analysis

    PubMed Central

    Li, Pohan; Wang, Wei

    2017-01-01

    Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747

  6. An index-based algorithm for fast on-line query processing of latent semantic analysis.

    PubMed

    Zhang, Mingxi; Li, Pohan; Wang, Wei

    2017-01-01

    Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.

  7. Spatial information semantic query based on SPARQL

    NASA Astrophysics Data System (ADS)

    Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang

    2009-10-01

    How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.

  8. Secure Skyline Queries on Cloud Platform.

    PubMed

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-04-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.

  9. Secure Skyline Queries on Cloud Platform

    PubMed Central

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-01-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710

  10. Sexual information seeking on web search engines.

    PubMed

    Spink, Amanda; Koricich, Andrew; Jansen, B J; Cole, Charles

    2004-02-01

    Sexual information seeking is an important element within human information behavior. Seeking sexually related information on the Internet takes many forms and channels, including chat rooms discussions, accessing Websites or searching Web search engines for sexual materials. The study of sexual Web queries provides insight into sexually-related information-seeking behavior, of value to Web users and providers alike. We qualitatively analyzed queries from logs of 1,025,910 Alta Vista and AlltheWeb.com Web user queries from 2001. We compared the differences in sexually-related Web searching between Alta Vista and AlltheWeb.com users. Differences were found in session duration, query outcomes, and search term choices. Implications of the findings for sexual information seeking are discussed.

  11. Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters.

    PubMed

    Lokker, Cynthia; Haynes, R Brian; Wilczynski, Nancy L; McKibbon, K Ann; Walter, Stephen D

    2011-01-01

    Clinical Queries filters were developed to improve the retrieval of high-quality studies in searches on clinical matters. The study objective was to determine the yield of relevant citations and physician satisfaction while searching for diagnostic and treatment studies using the Clinical Queries page of PubMed compared with searching PubMed without these filters. Forty practicing physicians, presented with standardized treatment and diagnosis questions and one question of their choosing, entered search terms which were processed in a random, blinded fashion through PubMed alone and PubMed Clinical Queries. Participants rated search retrievals for applicability to the question at hand and satisfaction. For treatment, the primary outcome of retrieval of relevant articles was not significantly different between the groups, but a higher proportion of articles from the Clinical Queries searches met methodologic criteria (p=0.049), and more articles were published in core internal medicine journals (p=0.056). For diagnosis, the filtered results returned more relevant articles (p=0.031) and fewer irrelevant articles (overall retrieval less, p=0.023); participants needed to screen fewer articles before arriving at the first relevant citation (p<0.05). Relevance was also influenced by content terms used by participants in searching. Participants varied greatly in their search performance. Clinical Queries filtered searches returned more high-quality studies, though the retrieval of relevant articles was only statistically different between the groups for diagnosis questions. Retrieving clinically important research studies from Medline is a challenging task for physicians. Methodological search filters can improve search retrieval.

  12. Term Relevance Weights in On-Line Information Retrieval

    ERIC Educational Resources Information Center

    Salton, G.; Waldstein, R. K.

    1978-01-01

    Term relevance weighting systems in interactive information retrieval are reviewed. An experiment in which information retrieval users ranked query terms in decreasing order of presumed importance prior to actual search and retrieval is described. (Author/KP)

  13. Generating Personalized Web Search Using Semantic Context

    PubMed Central

    Xu, Zheng; Chen, Hai-Yan; Yu, Jie

    2015-01-01

    The “one size fits the all” criticism of search engines is that when queries are submitted, the same results are returned to different users. In order to solve this problem, personalized search is proposed, since it can provide different search results based upon the preferences of users. However, existing methods concentrate more on the long-term and independent user profile, and thus reduce the effectiveness of personalized search. In this paper, the method captures the user context to provide accurate preferences of users for effectively personalized search. First, the short-term query context is generated to identify related concepts of the query. Second, the user context is generated based on the click through data of users. Finally, a forgetting factor is introduced to merge the independent user context in a user session, which maintains the evolution of user preferences. Experimental results fully confirm that our approach can successfully represent user context according to individual user information needs. PMID:26000335

  14. SAM Biotoxin Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target biotoxin analytes in environmental samples can use this online query tool to identify analytical methods included in EPA's Selected Analytical Methods for Environmental Remediation and Recovery for select biotoxins.

  15. SAM Radiochemical Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target radiochemical analytes in environmental samples can use this online query tool to identify analytical methods in EPA's Selected Analytical Methods for Environmental Remediation and Recovery for select radiochemical analytes.

  16. SAM Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target chemical, radiochemical, pathogens, and biotoxin analytes in environmental samples can use this online query tool to identify analytical methods included in EPA's Selected Analytical Methods for Environmental Remediation

  17. SAM Pathogen Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target pathogen analytes in environmental samples can use this online query tool to identify analytical methods in EPA's Selected Analytical Methods for Environmental Remediation and Recovery for select pathogens.

  18. SAM Chemical Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target chemical, radiochemical, pathogens, and biotoxin analytes in environmental samples can use this online query tool to identify analytical methods in EPA's Selected Analytical Methods for Environmental Remediation and Recovery

  19. Using Bitmap Indexing Technology for Combined Numerical and TextQueries

    SciTech Connect

    Stockinger, Kurt; Cieslewicz, John; Wu, Kesheng

    2006-10-16

    In this paper, we describe a strategy of using compressedbitmap indices to speed up queries on both numerical data and textdocuments. By using an efficient compression algorithm, these compressedbitmap indices are compact even for indices with millions of distinctterms. Moreover, bitmap indices can be used very efficiently to answerBoolean queries over text documents involving multiple query terms.Existing inverted indices for text searches are usually inefficient forcorpora with a very large number of terms as well as for queriesinvolving a large number of hits. We demonstrate that our compressedbitmap index technology overcomes both of those short-comings. In aperformance comparison against amore » commonly used database system, ourindices answer queries 30 times faster on average. To provide full SQLsupport, we integrated our indexing software, called FastBit, withMonetDB. The integrated system MonetDB/FastBit provides not onlyefficient searches on a single table as FastBit does, but also answersjoin queries efficiently. Furthermore, MonetDB/FastBit also provides avery efficient retrieval mechanism of result records.« less

  20. Folksonomical P2P File Sharing Networks Using Vectorized KANSEI Information as Search Tags

    NASA Astrophysics Data System (ADS)

    Ohnishi, Kei; Yoshida, Kaori; Oie, Yuji

    We present the concept of folksonomical peer-to-peer (P2P) file sharing networks that allow participants (peers) to freely assign structured search tags to files. These networks are similar to folksonomies in the present Web from the point of view that users assign search tags to information distributed over a network. As a concrete example, we consider an unstructured P2P network using vectorized Kansei (human sensitivity) information as structured search tags for file search. Vectorized Kansei information as search tags indicates what participants feel about their files and is assigned by the participant to each of their files. A search query also has the same form of search tags and indicates what participants want to feel about files that they will eventually obtain. A method that enables file search using vectorized Kansei information is the Kansei query-forwarding method, which probabilistically propagates a search query to peers that are likely to hold more files having search tags that are similar to the query. The similarity between the search query and the search tags is measured in terms of their dot product. The simulation experiments examine if the Kansei query-forwarding method can provide equal search performance for all peers in a network in which only the Kansei information and the tendency with respect to file collection are different among all of the peers. The simulation results show that the Kansei query forwarding method and a random-walk-based query forwarding method, for comparison, work effectively in different situations and are complementary. Furthermore, the Kansei query forwarding method is shown, through simulations, to be superior to or equal to the random-walk based one in terms of search speed.

  1. Querying Safety Cases

    NASA Technical Reports Server (NTRS)

    Denney, Ewen W.; Naylor, Dwight; Pai, Ganesh

    2014-01-01

    Querying a safety case to show how the various stakeholders' concerns about system safety are addressed has been put forth as one of the benefits of argument-based assurance (in a recent study by the Health Foundation, UK, which reviewed the use of safety cases in safety-critical industries). However, neither the literature nor current practice offer much guidance on querying mechanisms appropriate for, or available within, a safety case paradigm. This paper presents a preliminary approach that uses a formal basis for querying safety cases, specifically Goal Structuring Notation (GSN) argument structures. Our approach semantically enriches GSN arguments with domain-specific metadata that the query language leverages, along with its inherent structure, to produce views. We have implemented the approach in our toolset AdvoCATE, and illustrate it by application to a fragment of the safety argument for an Unmanned Aircraft System (UAS) being developed at NASA Ames. We also discuss the potential practical utility of our query mechanism within the context of the existing framework for UAS safety assurance.

  2. In Search of Decay in Verbal Short-Term Memory

    ERIC Educational Resources Information Center

    Berman, Marc G.; Jonides, John; Lewis, Richard L.

    2009-01-01

    Is forgetting in the short term due to decay with the mere passage of time, interference from other memoranda, or both? Past research on short-term memory has revealed some evidence for decay and a plethora of evidence showing that short-term memory is worsened by interference. However, none of these studies has directly contrasted decay and…

  3. Meta Search Engines.

    ERIC Educational Resources Information Center

    Garman, Nancy

    1999-01-01

    Describes common options and features to consider in evaluating which meta search engine will best meet a searcher's needs. Discusses number and names of engines searched; other sources and specialty engines; search queries; other search options; and results options. (AEF)

  4. Code query by example

    NASA Astrophysics Data System (ADS)

    Vaucouleur, Sebastien

    2011-02-01

    We introduce code query by example for customisation of evolvable software products in general and of enterprise resource planning systems (ERPs) in particular. The concept is based on an initial empirical study on practices around ERP systems. We motivate our design choices based on those empirical results, and we show how the proposed solution helps with respect to the infamous upgrade problem: the conflict between the need for customisation and the need for upgrade of ERP systems. We further show how code query by example can be used as a form of lightweight static analysis, to detect automatically potential defects in large software products. Code query by example as a form of lightweight static analysis is particularly interesting in the context of ERP systems: it is often the case that programmers working in this field are not computer science specialists but more of domain experts. Hence, they require a simple language to express custom rules.

  5. Multi-field query expansion is effective for biomedical dataset retrieval.

    PubMed

    Bouadjenek, Mohamed Reda; Verspoor, Karin

    2017-01-01

    In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery

  6. Multi-field query expansion is effective for biomedical dataset retrieval

    PubMed Central

    2017-01-01

    Abstract In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data

  7. Keeping Dublin Core Simple: Cross-Domain Discovery or Resource Description?; First Steps in an Information Commerce Economy: Digital Rights Management in the Emerging E-Book Environment; Interoperability: Digital Rights Management and the Emerging EBook Environment; Searching the Deep Web: Direct Query Engine Applications at the Department of Energy.

    ERIC Educational Resources Information Center

    Lagoze, Carl; Neylon, Eamonn; Mooney, Stephen; Warnick, Walter L.; Scott, R. L.; Spence, Karen J.; Johnson, Lorrie A.; Allen, Valerie S.; Lederman, Abe

    2001-01-01

    Includes four articles that discuss Dublin Core metadata, digital rights management and electronic books, including interoperability; and directed query engines, a type of search engine designed to access resources on the deep Web that is being used at the Department of Energy. (LRW)

  8. Manchester visual query language

    NASA Astrophysics Data System (ADS)

    Oakley, John P.; Davis, Darryl N.; Shann, Richard T.

    1993-04-01

    We report a database language for visual retrieval which allows queries on image feature information which has been computed and stored along with images. The language is novel in that it provides facilities for dealing with feature data which has actually been obtained from image analysis. Each line in the Manchester Visual Query Language (MVQL) takes a set of objects as input and produces another, usually smaller, set as output. The MVQL constructs are mainly based on proven operators from the field of digital image analysis. An example is the Hough-group operator which takes as input a specification for the objects to be grouped, a specification for the relevant Hough space, and a definition of the voting rule. The output is a ranked list of high scoring bins. The query could be directed towards one particular image or an entire image database, in the latter case the bins in the output list would in general be associated with different images. We have implemented MVQL in two layers. The command interpreter is a Lisp program which maps each MVQL line to a sequence of commands which are used to control a specialized database engine. The latter is a hybrid graph/relational system which provides low-level support for inheritance and schema evolution. In the paper we outline the language and provide examples of useful queries. We also describe our solution to the engineering problems associated with the implementation of MVQL.

  9. FRS EZ Query

    EPA Pesticide Factsheets

    This page is the starting point for EZ Query. This page describes how to select key data elements from EPA's Facility Information Database and Geospatial Reference Database to build a tabular report or a Comma Separated Value (CSV) files for downloading.

  10. Environmental Dataset Gateway (EDG) Search Widget

    EPA Pesticide Factsheets

    Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other other applications. This allows individuals to provide direct access to EPA's metadata outside the EDG interface. The EDG Search Widget makes it possible to search the EDG from another web page or application. The search widget can be included on your website by simply inserting one or two lines of code. Users can type a search term or lucene search query in the search field and retrieve a pop-up list of records that match that search.

  11. Text mining for search term development in systematic reviewing: A discussion of some methods and challenges.

    PubMed

    Stansfield, Claire; O'Mara-Eves, Alison; Thomas, James

    2017-09-01

    Using text mining to aid the development of database search strings for topics described by diverse terminology has potential benefits for systematic reviews; however, methods and tools for accomplishing this are poorly covered in the research methods literature. We briefly review the literature on applications of text mining for search term development for systematic reviewing. We found that the tools can be used in 5 overarching ways: improving the precision of searches; identifying search terms to improve search sensitivity; aiding the translation of search strategies across databases; searching and screening within an integrated system; and developing objectively derived search strategies. Using a case study and selected examples, we then reflect on the utility of certain technologies (term frequency-inverse document frequency and Termine, term frequency, and clustering) in improving the precision and sensitivity of searches. Challenges in using these tools are discussed. The utility of these tools is influenced by the different capabilities of the tools, the way the tools are used, and the text that is analysed. Increased awareness of how the tools perform facilitates the further development of methods for their use in systematic reviews. Copyright © 2017 John Wiley & Sons, Ltd.

  12. In Search of Decay in Verbal Short-Term Memory

    PubMed Central

    Berman, Marc G.; Jonides, John; Lewis, Richard L.

    2014-01-01

    Is forgetting in the short term due to decay with the mere passage of time, interference from other memoranda, or both? Past research on short-term memory has revealed some evidence for decay and a plethora of evidence showing that short-term memory is worsened by interference. However, none of these studies has directly contrasted decay and interference in short-term memory in a task that rules out the use of rehearsal processes. In this article the authors present a series of studies using a novel paradigm to address this problem directly, by interrogating the operation of decay and interference in short-term memory without rehearsal confounds. The results of these studies indicate that short-term memories are subject to very small decay effects with the mere passage of time but that interference plays a much larger role in their degradation. The authors discuss the implications of these results for existing models of memory decay and interference. PMID:19271849

  13. In search of decay in verbal short-term memory.

    PubMed

    Berman, Marc G; Jonides, John; Lewis, Richard L

    2009-03-01

    Is forgetting in the short term due to decay with the mere passage of time, interference from other memoranda, or both? Past research on short-term memory has revealed some evidence for decay and a plethora of evidence showing that short-term memory is worsened by interference. However, none of these studies has directly contrasted decay and interference in short-term memory in a task that rules out the use of rehearsal processes. In this article the authors present a series of studies using a novel paradigm to address this problem directly, by interrogating the operation of decay and interference in short-term memory without rehearsal confounds. The results of these studies indicate that short-term memories are subject to very small decay effects with the mere passage of time but that interference plays a much larger role in their degradation. The authors discuss the implications of these results for existing models of memory decay and interference. (c) 2009 APA, all rights reserved

  14. A novel methodology for querying web images

    NASA Astrophysics Data System (ADS)

    Prabhakara, Rashmi; Lee, Ching Cheng

    2005-01-01

    Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.

  15. A novel methodology for querying web images

    NASA Astrophysics Data System (ADS)

    Prabhakara, Rashmi; Lee, Ching Cheng

    2004-12-01

    Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.

  16. Assisting Consumer Health Information Retrieval with Query Recommendations

    PubMed Central

    Zeng, Qing T.; Crowell, Jonathan; Plovnick, Robert M.; Kim, Eunjung; Ngo, Long; Dibble, Emily

    2006-01-01

    Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. Design: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. Measurements: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. Results: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16–2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. Conclusion: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR. PMID:16221944

  17. Query Expansion Using SNOMED-CT and Weighing Schemes

    DTIC Science & Technology

    2014-11-01

    For this research, we have used SNOMED-CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. General Terms...CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17...University of the Basque country discuss their finding on query expansion using external sources headlined by Unified Medical Language System ( UMLS

  18. Long-Term Priming of Visual Search Prevails against the Passage of Time and Counteracting Instructions

    ERIC Educational Resources Information Center

    Kruijne, Wouter; Meeter, Martijn

    2016-01-01

    Studies on "intertrial priming" have shown that in visual search experiments, the preceding trial automatically affects search performance: facilitating it when the target features repeat and giving rise to switch costs when they change--so-called (short-term) intertrial priming. These effects also occur at longer time scales: When 1 of…

  19. Query-Based Outlier Detection in Heterogeneous Information Networks.

    PubMed

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-03-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.

  20. Query-Based Outlier Detection in Heterogeneous Information Networks

    PubMed Central

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-01-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397

  1. Multidimensional indexing structure for use with linear optimization queries

    NASA Technical Reports Server (NTRS)

    Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)

    2002-01-01

    Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.

  2. A Query Integrator and Manager for the Query Web

    PubMed Central

    Brinkley, James F.; Detwiler, Landon T.

    2012-01-01

    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831

  3. Generating PubMed Chemical Queries for Consumer Health Literature

    PubMed Central

    Loo, Jeffery; Chang, Hua Florence; Hochstein, Colette; Sun, Ying

    2005-01-01

    Two popular NLM resources that provide information for consumers about chemicals and their safety are the Household Products Database and Haz-Map. Search queries to PubMed via web links were generated from these databases. The query retrieves consumer health-oriented literature about adverse effects of chemicals. The retrieval was limited to a manageable set of 20 to 60 citations, achieved by successively applying increasing limits to the search until the desired number of references was reached. PMID:16779322

  4. Effective Structured Query Formulation for Session Search

    DTIC Science & Technology

    2012-11-01

    with top frequency are “type of paralysi”, “ quadriplegia paraplegia”, “paraplegia”, “spinal cord injury”, and “quadriplegic tetraplegic”, so the final... quadriplegia paraplegia) 0.004819 paraplegia 0.004819 #combine(spinal cord injury) 0.00241 #combine(quadriplegic tetraplegic) )”, where the

  5. Independence of long-term contextual memory and short-term perceptual hypotheses: Evidence from contextual cueing of interrupted search.

    PubMed

    Schlagbauer, Bernhard; Mink, Maurice; Müller, Hermann J; Geyer, Thomas

    2017-02-01

    Observers are able to resume an interrupted search trial faster relative to responding to a new, unseen display. This finding of rapid resumption is attributed to short-term perceptual hypotheses generated on the current look and confirmed upon subsequent looks at the same display. It has been suggested that the contents of perceptual hypotheses are similar to those of other forms of memory acquired long-term through repeated exposure to the same search displays over the course of several trials, that is, the memory supporting "contextual cueing." In three experiments, we investigated the relationship between short-term perceptual hypotheses and long-term contextual memory. The results indicated that long-term, contextual memory of repeated displays neither affected the generation nor the confirmation of short-term perceptual hypotheses for these displays. Furthermore, the analysis of eye movements suggests that long-term memory provides an initial benefit in guiding attention to the target, whereas in subsequent looks guidance is entirely based on short-term perceptual hypotheses. Overall, the results reveal a picture of both long- and short-term memory contributing to reliable performance gains in interrupted search, while exerting their effects in an independent manner.

  6. Noesis: Ontology based Scoped Search Engine and Resource Aggregator for Atmospheric Science

    NASA Astrophysics Data System (ADS)

    Ramachandran, R.; Movva, S.; Li, X.; Cherukuri, P.; Graves, S.

    2006-12-01

    The goal for search engines is to return results that are both accurate and complete. The search engines should find only what you really want and find everything you really want. Search engines (even meta search engines) lack semantics. The basis for search is simply based on string matching between the user's query term and the resource database and the semantics associated with the search string is not captured. For example, if an atmospheric scientist is searching for "pressure" related web resources, most search engines return inaccurate results such as web resources related to blood pressure. In this presentation Noesis, which is a meta-search engine and a resource aggregator that uses domain ontologies to provide scoped search capabilities will be described. Noesis uses domain ontologies to help the user scope the search query to ensure that the search results are both accurate and complete. The domain ontologies guide the user to refine their search query and thereby reduce the user's burden of experimenting with different search strings. Semantics are captured by refining the query terms to cover synonyms, specializations, generalizations and related concepts. Noesis also serves as a resource aggregator. It categorizes the search results from different online resources such as education materials, publications, datasets, web search engines that might be of interest to the user.

  7. Advanced SPARQL querying in small molecule databases.

    PubMed

    Galgonek, Jakub; Hurt, Tomáš; Michlíková, Vendula; Onderka, Petr; Schwarz, Jan; Vondrášek, Jiří

    2016-01-01

    In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF.

  8. Understanding vaccination resistance: vaccine search term selection bias and the valence of retrieved information.

    PubMed

    Ruiz, Jeanette B; Bell, Robert A

    2014-10-07

    Dubious vaccination-related information on the Internet leads some parents to opt out of vaccinating their children. To determine if negative, neutral and positive search terms retrieve vaccination information that differs in valence and confirms searchers' assumptions about vaccination. A content analysis of first-page Google search results was conducted using three negative, three neutral, and three positive search terms for the concepts "vaccine," "vaccination," and "MMR"; 84 of the 90 websites retrieved met inclusion requirements. Two coders independently and reliably coded for the presence or absence of each of 15 myths about vaccination (e.g., "vaccines cause autism"), statements that countered these myths, and recommendations for or against vaccination. Data were analyzed using descriptive statistics. Across all websites, at least one myth was perpetuated on 16.7% of websites and at least one myth was countered on 64.3% of websites. The mean number of myths perpetuated on websites retrieved with negative, neutral, and positive search terms, respectively, was 1.93, 0.53, and 0.40. The mean number of myths countered on websites retrieved with negative, neutral, and positive search terms, respectively, was 3.0, 3.27, and 2.87. Explicit recommendations regarding vaccination were offered on 22.6% of websites. A recommendation against vaccination was more often made on websites retrieved with negative search terms (37.5% of recommendations) than on websites retrieved with neutral (12.5%) or positive (0%) search terms. The concerned parent who seeks information about the risks of childhood immunizations will find more websites that perpetuate vaccine myths and recommend against vaccination than the parent who seeks information about the benefits of vaccination. This suggests that search term valence can lead to online information that supports concerned parents' misconceptions about vaccines. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. A Semantic Graph Query Language

    SciTech Connect

    Kaplan, I L

    2006-10-16

    Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.

  10. Implicit short- and long-term memory direct our gaze in visual search.

    PubMed

    Kruijne, Wouter; Meeter, Martijn

    2016-04-01

    Visual attention is strongly affected by the past: both by recent experience and by long-term regularities in the environment that are encoded in and retrieved from memory. In visual search, intertrial repetition of targets causes speeded response times (short-term priming). Similarly, targets that are presented more often than others may facilitate search, even long after it is no longer present (long-term priming). In this study, we investigate whether such short-term priming and long-term priming depend on dissociable mechanisms. By recording eye movements while participants searched for one of two conjunction targets, we explored at what stages of visual search different forms of priming manifest. We found both long- and short- term priming effects. Long-term priming persisted long after the bias was present, and was again found even in participants who were unaware of a color bias. Short- and long-term priming affected the same stage of the task; both biased eye movements towards targets with the primed color, already starting with the first eye movement. Neither form of priming affected the response phase of a trial, but response repetition did. The results strongly suggest that both long- and short-term memory can implicitly modulate feedforward visual processing.

  11. SPARQL Assist language-neutral query composer

    PubMed Central

    2012-01-01

    Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327

  12. SPARQL assist language-neutral query composer.

    PubMed

    McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark

    2012-01-25

    SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.

  13. A study of the influence of task familiarity on user behaviors and performance with a MeSH term suggestion interface for PubMed bibliographic search.

    PubMed

    Tang, Muh-Chyun; Liu, Ying-Hsang; Wu, Wan-Ching

    2013-09-01

    Previous research has shown that information seekers in biomedical domain need more support in formulating their queries. A user study was conducted to evaluate the effectiveness of a metadata based query suggestion interface for PubMed bibliographic search. The study also investigated the impact of search task familiarity on search behaviors and the effectiveness of the interface. A real user, user search request and real system approach was used for the study. Unlike tradition IR evaluation, where assigned tasks were used, the participants were asked to search requests of their own. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. Several performance criteria were measured to assess the potential benefits of the experimental interface, including users' assessment of their original and eventual queries, the perceived usefulness of the interfaces, satisfaction with the search results, and the average relevance score of the saved records. The results show that, when searching for an unfamiliar topic, users were more likely to change their queries, indicating the effect of familiarity on search behaviors. The results also show that the interface scored higher on several of the performance criteria, such as the "goodness" of the queries, perceived usefulness, and user satisfaction. Furthermore, in line with our hypothesis, the proposed interface was relatively more effective when less familiar search requests were attempted. Results indicate that there is a selective compatibility between search familiarity and search interface. One implication of the research for system evaluation is the importance of taking into consideration task familiarity when assessing the effectiveness of interactive IR systems. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  14. Virtual Solar Observatory Distributed Query Construction

    NASA Technical Reports Server (NTRS)

    Gurman, J. B.; Dimitoglou, G.; Bogart, R.; Davey, A.; Hill, F.; Martens, P.

    2003-01-01

    Through a prototype implementation (Tian et al., this meeting) the VSO has already demonstrated the capability of unifying geographically distributed data sources following the Web Services paradigm and utilizing mechanisms such as the Simple Object Access Protocol (SOAP). So far, four participating sites (Stanford, Montana State University, National Solar Observatory and the Solar Data Analysis Center) permit Web-accessible, time-based searches that allow browse access to a number of diverse data sets. Our latest work includes the extension of the simple, time-based queries to include numerous other searchable observation parameters. For VSO users, this extended functionality enables more refined searches. For the VSO, it is a proof of concept that more complex, distributed queries can be effectively constructed and that results from heterogeneous, remote sources can be synthesized and presented to users as a single, virtual data product.

  15. Strategic search from long-term memory: an examination of semantic and autobiographical recall.

    PubMed

    Unsworth, Nash; Brewer, Gene A; Spillers, Gregory J

    2014-01-01

    Searching long-term memory is theoretically driven by both directed (search strategies) and random components. In the current study we conducted four experiments evaluating strategic search in semantic and autobiographical memory. Participants were required to generate either exemplars from the category of animals or the names of their friends for several minutes. Self-reported strategies suggested that participants typically relied on visualization strategies for both tasks and were less likely to rely on ordered strategies (e.g., alphabetic search). When participants were instructed to use particular strategies, the visualization strategy resulted in the highest levels of performance and the most efficient search, whereas ordered strategies resulted in the lowest levels of performance and fairly inefficient search. These results are consistent with the notion that retrieval from long-term memory is driven, in part, by search strategies employed by the individual, and that one particularly efficient strategy is to visualize various situational contexts that one has experienced in the past in order to constrain the search and generate the desired information.

  16. Queries for Bias Testing

    NASA Technical Reports Server (NTRS)

    Gordon, Diana F.

    1992-01-01

    Selecting a good bias prior to concept learning can be difficult. Therefore, dynamic bias adjustment is becoming increasingly popular. Current dynamic bias adjustment systems, however, are limited in their ability to identify erroneous assumptions about the relationship between the bias and the target concept. Without proper diagnosis, it is difficult to identify and then remedy faulty assumptions. We have developed an approach that makes these assumptions explicit, actively tests them with queries to an oracle, and adjusts the bias based on the test results.

  17. Supporting ontology-based keyword search over medical databases.

    PubMed

    Kementsietsidis, Anastasios; Lim, Lipyeow; Wang, Min

    2008-11-06

    The proliferation of medical terms poses a number of challenges in the sharing of medical information among different stakeholders. Ontologies are commonly used to establish relationships between different terms, yet their role in querying has not been investigated in detail. In this paper, we study the problem of supporting ontology-based keyword search queries on a database of electronic medical records. We present several approaches to support this type of queries, study the advantages and limitations of each approach, and summarize the lessons learned as best practices.

  18. CUFID-query: accurate network querying through random walk based network flow estimation.

    PubMed

    Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

    2017-12-28

    performance evaluation based on biological networks with known functional modules, we show that CUFID-query outperforms the existing state-of-the-art algorithms in terms of prediction accuracy and biological significance of the predictions.

  19. SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory

    NASA Astrophysics Data System (ADS)

    Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.

    2002-12-01

    We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a

  20. Space Object Query Tool

    NASA Technical Reports Server (NTRS)

    Phillips, Veronica J.

    2017-01-01

    STI is for a fact sheet on the Space Object Query Tool being created by the MDC. When planning launches, NASA must first factor in the tens of thousands of objects already in orbit around the Earth. The number of human-made objects, including nonfunctional spacecraft, abandoned launch vehicle stages, mission-related debris and fragmentation debris orbiting Earth has grown steadily since Sputnik 1 was launched in 1957. Currently, the U.S. Department of Defenses Joint Space Operations Center, or JSpOC, tracks over 15,000 distinct objects and provides data for more than 40,000 objects via its Space-Track program, found at space-track.org.

  1. Design Recommendations for Query Languages

    DTIC Science & Technology

    1980-09-01

    DESIGN RECOMMENDATIONS FOR QUERY LANGUAGES S.L. Ehrenreich Submitted by: Stanley M. Halpin, Acting Chief HUMAN FACTORS TECHNICAL AREA Approved by: Edgar ...respond to que- ries that it recognizes as faulty. Codd (1974) states that in designing a nat- ural query language, attention must be given to dealing...impaired. Codd (1974) also regarded the user’s perception of the data base to be of critical importance in properly designing a query language system

  2. The CMS DBS query language

    NASA Astrophysics Data System (ADS)

    Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

    2010-04-01

    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.

  3. VISAGE: Interactive Visual Graph Querying.

    PubMed

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2016-06-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.

  4. VISAGE: Interactive Visual Graph Querying

    PubMed Central

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2017-01-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670

  5. Web page sorting algorithm based on query keyword distance relation

    NASA Astrophysics Data System (ADS)

    Yang, Han; Cui, Hong Gang; Tang, Hao

    2017-08-01

    In order to optimize the problem of page sorting, according to the search keywords in the web page in the relationship between the characteristics of the proposed query keywords clustering ideas. And it is converted into the degree of aggregation of the search keywords in the web page. Based on the PageRank algorithm, the clustering degree factor of the query keyword is added to make it possible to participate in the quantitative calculation. This paper proposes an improved algorithm for PageRank based on the distance relation between search keywords. The experimental results show the feasibility and effectiveness of the method.

  6. Query Expansion and Query Translation as Logical Inference.

    ERIC Educational Resources Information Center

    Nie, Jian-Yun

    2003-01-01

    Examines query expansion during query translation in cross language information retrieval and develops a general framework for inferential information retrieval in two particular contexts: using fuzzy logic and probability theory. Obtains evaluation formulas that are shown to strongly correspond to those used in other information retrieval models.…

  7. The role of economics in the QUERI program: QUERI Series.

    PubMed

    Smith, Mark W; Barnett, Paul G

    2008-04-22

    The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.

  8. An adaptive random search for short term generation scheduling with network constraints.

    PubMed

    Marmolejo, J A; Velasco, Jonás; Selley, Héctor J

    2017-01-01

    This paper presents an adaptive random search approach to address a short term generation scheduling with network constraints, which determines the startup and shutdown schedules of thermal units over a given planning horizon. In this model, we consider the transmission network through capacity limits and line losses. The mathematical model is stated in the form of a Mixed Integer Non Linear Problem with binary variables. The proposed heuristic is a population-based method that generates a set of new potential solutions via a random search strategy. The random search is based on the Markov Chain Monte Carlo method. The main key of the proposed method is that the noise level of the random search is adaptively controlled in order to exploring and exploiting the entire search space. In order to improve the solutions, we consider coupling a local search into random search process. Several test systems are presented to evaluate the performance of the proposed heuristic. We use a commercial optimizer to compare the quality of the solutions provided by the proposed method. The solution of the proposed algorithm showed a significant reduction in computational effort with respect to the full-scale outer approximation commercial solver. Numerical results show the potential and robustness of our approach.

  9. An ontology-based search engine for protein-protein interactions.

    PubMed

    Park, Byungkyu; Han, Kyungsook

    2010-01-18

    Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology.

  10. An ontology-based search engine for protein-protein interactions

    PubMed Central

    2010-01-01

    Background Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. Results We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Conclusion Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology. PMID:20122195

  11. Internet search term affects the quality and accuracy of online information about developmental hip dysplasia.

    PubMed

    Fabricant, Peter D; Dy, Christopher J; Patel, Ronak M; Blanco, John S; Doyle, Shevaun M

    2013-06-01

    The recent emphasis on shared decision-making has increased the role of the Internet as a readily accessible medical reference source for patients and families. However, the lack of professional review creates concern over the quality, accuracy, and readability of medical information available to patients on the Internet. Three Internet search engines (Google, Yahoo, and Bing) were evaluated prospectively using 3 difference search terms of varying sophistication ("congenital hip dislocation," "developmental dysplasia of the hip," and "hip dysplasia in children"). Sixty-three unique Web sites were evaluated by each of 3 surgeons (2 fellowship-trained pediatric orthopaedic attendings and 1 orthopaedic chief resident) for quality and accuracy using a set of scoring criteria based on the AAOS/POSNA patient education Web site. The readability (literacy grade level) of each Web site was assessed using the Fleisch-Kincaid score. There were significant differences noted in quality, accuracy, and readability of information depending on the search term used. The search term "developmental dysplasia of the hip" provided higher quality and accuracy compared with the search term "congenital hip dislocation." Of the 63 total Web sites, 1 (1.6%) was below the sixth grade reading level recommended by the NIH for health education materials and 8 (12.7%) Web sites were below the average American reading level (eighth grade). The quality and accuracy of information available on the Internet regarding developmental hip dysplasia significantly varied with the search term used. Patients seeking information about DDH on the Internet may not understand the materials found because nearly all of the Web sites are written at a level above that recommended for publically distributed health information. Physicians should advise their patients to search for information using the term "developmental dysplasia of the hip" or, better yet, should refer patients to Web sites that they have

  12. Query-by-example surgical activity detection.

    PubMed

    Gao, Yixin; Vedula, S Swaroop; Lee, Gyusung I; Lee, Mija R; Khudanpur, Sanjeev; Hager, Gregory D

    2016-06-01

    Easy acquisition of surgical data opens many opportunities to automate skill evaluation and teaching. Current technology to search tool motion data for surgical activity segments of interest is limited by the need for manual pre-processing, which can be prohibitive at scale. We developed a content-based information retrieval method, query-by-example (QBE), to automatically detect activity segments within surgical data recordings of long duration that match a query. The example segment of interest (query) and the surgical data recording (target trial) are time series of kinematics. Our approach includes an unsupervised feature learning module using a stacked denoising autoencoder (SDAE), two scoring modules based on asymmetric subsequence dynamic time warping (AS-DTW) and template matching, respectively, and a detection module. A distance matrix of the query against the trial is computed using the SDAE features, followed by AS-DTW combined with template scoring, to generate a ranked list of candidate subsequences (substrings). To evaluate the quality of the ranked list against the ground-truth, thresholding conventional DTW distances and bipartite matching are applied. We computed the recall, precision, F1-score, and a Jaccard index-based score on three experimental setups. We evaluated our QBE method using a suture throw maneuver as the query, on two tool motion datasets (JIGSAWS and MISTIC-SL) captured in a training laboratory. We observed a recall of 93, 90 and 87 % and a precision of 93, 91, and 88 % with same surgeon same trial (SSST), same surgeon different trial (SSDT) and different surgeon (DS) experiment setups on JIGSAWS, and a recall of 87, 81 and 75 % and a precision of 72, 61, and 53 % with SSST, SSDT and DS experiment setups on MISTIC-SL, respectively. We developed a novel, content-based information retrieval method to automatically detect multiple instances of an activity within long surgical recordings. Our method demonstrated adequate recall

  13. A three-term conjugate gradient method under the strong-Wolfe line search

    NASA Astrophysics Data System (ADS)

    Khadijah, Wan; Rivaie, Mohd; Mamat, Mustafa

    2017-08-01

    Recently, numerous studies have been concerned in conjugate gradient methods for solving large-scale unconstrained optimization method. In this paper, a three-term conjugate gradient method is proposed for unconstrained optimization which always satisfies sufficient descent direction and namely as Three-Term Rivaie-Mustafa-Ismail-Leong (TTRMIL). Under standard conditions, TTRMIL method is proved to be globally convergent under strong-Wolfe line search. Finally, numerical results are provided for the purpose of comparison.

  14. Motivation and short-term memory in visual search: Attention's accelerator revisited.

    PubMed

    Schneider, Daniel; Bonmassar, Claudia; Hickey, Clayton

    2018-05-01

    A cue indicating the possibility of cash reward will cause participants to perform memory-based visual search more efficiently. A recent study has suggested that this performance benefit might reflect the use of multiple memory systems: when needed, participants may maintain the to-be-remembered object in both long-term and short-term visual memory, with this redundancy benefitting target identification during search (Reinhart, McClenahan & Woodman, 2016). Here we test this compelling hypothesis. We had participants complete a memory-based visual search task involving a reward cue that either preceded presentation of the to-be-remembered target (pre-cue) or followed it (retro-cue). Following earlier work, we tracked memory representation using two components of the event-related potential (ERP): the contralateral delay activity (CDA), reflecting short-term visual memory, and the anterior P170, reflecting long-term storage. We additionally tracked attentional preparation and deployment in the contingent negative variation (CNV) and N2pc, respectively. Results show that only the reward pre-cue impacted our ERP indices of memory. However, both types of cue elicited a robust CNV, reflecting an influence on task preparation, both had equivalent impact on deployment of attention to the target, as indexed in the N2pc, and both had equivalent impact on visual search behavior. Reward prospect thus has an influence on memory-guided visual search, but this does not appear to be necessarily mediated by a change in the visual memory representations indexed by CDA. Our results demonstrate that the impact of motivation on search is not a simple product of improved memory for target templates. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Querying Proofs (Work in Progress)

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2011-01-01

    We motivate and introduce the basis for a query language designed for inspecting electronic representations of proofs. We argue that there is much to learn from large proofs beyond their validity, and that a dedicated query language can provide a principled way of implementing a family of useful operations.

  16. In-context query reformulation for failing SPARQL queries

    NASA Astrophysics Data System (ADS)

    Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James

    2017-05-01

    Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.

  17. An Exemplar-Familiarity Model Predicts Short-Term and Long-Term Probe Recognition across Diverse Forms of Memory Search

    ERIC Educational Resources Information Center

    Nosofsky, Robert M.; Cox, Gregory E.; Cao, Rui; Shiffrin, Richard M.

    2014-01-01

    Experiments were conducted to test a modern exemplar-familiarity model on its ability to account for both short-term and long-term probe recognition within the same memory-search paradigm. Also, making connections to the literature on attention and visual search, the model was used to interpret differences in probe-recognition performance across…

  18. The role of economics in the QUERI program: QUERI Series

    PubMed Central

    Smith, Mark W; Barnett, Paul G

    2008-01-01

    Background The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. Methods We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Results Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Conclusion Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics. PMID:18430199

  19. Using internet searches for influenza surveillance.

    PubMed

    Polgreen, Philip M; Chen, Yiling; Pennock, David M; Nelson, Forrest D

    2008-12-01

    The Internet is an important source of health information. Thus, the frequency of Internet searches may provide information regarding infectious disease activity. As an example, we examined the relationship between searches for influenza and actual influenza occurrence. Using search queries from the Yahoo! search engine ( http://search.yahoo.com ) from March 2004 through May 2008, we counted daily unique queries originating in the United States that contained influenza-related search terms. Counts were divided by the total number of searches, and the resulting daily fraction of searches was averaged over the week. We estimated linear models, using searches with 1-10-week lead times as explanatory variables to predict the percentage of cultures positive for influenza and deaths attributable to pneumonia and influenza in the United States. With use of the frequency of searches, our models predicted an increase in cultures positive for influenza 1-3 weeks in advance of when they occurred (P < .001), and similar models predicted an increase in mortality attributable to pneumonia and influenza up to 5 weeks in advance (P < .001). Search-term surveillance may provide an additional tool for disease surveillance.

  20. Fragger: a protein fragment picker for structural queries.

    PubMed

    Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

    2017-01-01

    Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

  1. Querying XML Data with SPARQL

    NASA Astrophysics Data System (ADS)

    Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros

    SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.

  2. Sensitivity and Predictive Value of 15 PubMed Search Strategies to Answer Clinical Questions Rated Against Full Systematic Reviews

    PubMed Central

    Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

    2012-01-01

    Background Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. Objective To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. Methods We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed’s Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. Results The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%–25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%–30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. Conclusions The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help

  3. Sensitivity and predictive value of 15 PubMed search strategies to answer clinical questions rated against full systematic reviews.

    PubMed

    Agoritsas, Thomas; Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

    2012-06-12

    Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed's Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%-25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%-30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help clinicians apply effective strategies to answer their

  4. The Development of Automaticity in Short-Term Memory Search: Item-Response Learning and Category Learning

    ERIC Educational Resources Information Center

    Cao, Rui; Nosofsky, Robert M.; Shiffrin, Richard M.

    2017-01-01

    In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across…

  5. Essie: A Concept-based Search Engine for Structured Biomedical Text

    PubMed Central

    Ide, Nicholas C.; Loane, Russell F.; Demner-Fushman, Dina

    2007-01-01

    This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie’s design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie’s performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain. PMID:17329729

  6. Sundanese ancient manuscripts search engine using probability approach

    NASA Astrophysics Data System (ADS)

    Suryani, Mira; Hadi, Setiawan; Paulus, Erick; Nurma Yulita, Intan; Supriatna, Asep K.

    2017-10-01

    Today, Information and Communication Technology (ICT) has become a regular thing for every aspect of live include cultural and heritage aspect. Sundanese ancient manuscripts as Sundanese heritage are in damage condition and also the information that containing on it. So in order to preserve the information in Sundanese ancient manuscripts and make them easier to search, a search engine has been developed. The search engine must has good computing ability. In order to get the best computation in developed search engine, three types of probabilistic approaches: Bayesian Networks Model, Divergence from Randomness with PL2 distribution, and DFR-PL2F as derivative form DFR-PL2 have been compared in this study. The three probabilistic approaches supported by index of documents and three different weighting methods: term occurrence, term frequency, and TF-IDF. The experiment involved 12 Sundanese ancient manuscripts. From 12 manuscripts there are 474 distinct terms. The developed search engine tested by 50 random queries for three types of query. The experiment results showed that for the single query and multiple query, the best searching performance given by the combination of PL2F approach and TF-IDF weighting method. The performance has been evaluated using average time responds with value about 0.08 second and Mean Average Precision (MAP) about 0.33.

  7. Exploring Contextual Models in Chemical Patent Search

    NASA Astrophysics Data System (ADS)

    Urbain, Jay; Frieder, Ophir

    We explore the development of probabilistic retrieval models for integrating term statistics with entity search using multiple levels of document context to improve the performance of chemical patent search. A distributed indexing model was developed to enable efficient named entity search and aggregation of term statistics at multiple levels of patent structure including individual words, sentences, claims, descriptions, abstracts, and titles. The system can be scaled to an arbitrary number of compute instances in a cloud computing environment to support concurrent indexing and query processing operations on large patent collections.

  8. Ontology-based vector space model and fuzzy query expansion to retrieve knowledge on medical computational problem solutions.

    PubMed

    Bratsas, Charalampos; Koutkias, Vassilis; Kaimakamis, Evangelos; Bamidis, Panagiotis; Maglaveras, Nicos

    2007-01-01

    Medical Computational Problem (MCP) solving is related to medical problems and their computerized algorithmic solutions. In this paper, an extension of an ontology-based model to fuzzy logic is presented, as a means to enhance the information retrieval (IR) procedure in semantic management of MCPs. We present herein the methodology followed for the fuzzy expansion of the ontology model, the fuzzy query expansion procedure, as well as an appropriate ontology-based Vector Space Model (VSM) that was constructed for efficient mapping of user-defined MCP search criteria and MCP acquired knowledge. The relevant fuzzy thesaurus is constructed by calculating the simultaneous occurrences of terms and the term-to-term similarities derived from the ontology that utilizes UMLS (Unified Medical Language System) concepts by using Concept Unique Identifiers (CUI), synonyms, semantic types, and broader-narrower relationships for fuzzy query expansion. The current approach constitutes a sophisticated advance for effective, semantics-based MCP-related IR.

  9. Issues in the Design of a Pilot Concept-Based Query Interface for the Neuroinformatics Information Framework

    PubMed Central

    Li, Yuli; Martone, Maryann E.; Sternberg, Paul W.; Shepherd, Gordon M.; Miller, Perry L.

    2009-01-01

    This paper describes a pilot query interface that has been constructed to help us explore a “concept-based” approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface. PMID:18953674

  10. Issues in the design of a pilot concept-based query interface for the neuroinformatics information framework.

    PubMed

    Marenco, Luis; Li, Yuli; Martone, Maryann E; Sternberg, Paul W; Shepherd, Gordon M; Miller, Perry L

    2008-09-01

    This paper describes a pilot query interface that has been constructed to help us explore a "concept-based" approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface.

  11. Mapping Self-Guided Learners' Searches for Video Tutorials on YouTube

    ERIC Educational Resources Information Center

    Garrett, Nathan

    2016-01-01

    While YouTube has a wealth of educational videos, how self-guided learners use these resources has not been fully described. An analysis of search engine queries for help with the use of Microsoft Excel shows that few users search for specific features or functions but instead use very general terms. Because the same videos are returned in…

  12. Evolving discriminators for querying video sequences

    NASA Astrophysics Data System (ADS)

    Iyengar, Giridharan; Lippman, Andrew B.

    1997-01-01

    In this paper we present a framework for content based query and retrieval of information from large video databases. This framework enables content based retrieval of video sequences by characterizing the sequences using motion, texture and colorimetry cues. This characterization is biologically inspired and results in a compact parameter space where every segment of video is represented by an 8 dimensional vector. Searching and retrieval is done in real- time with accuracy in this parameter space. Using this characterization, we then evolve a set of discriminators using Genetic Programming Experiments indicate that these discriminators are capable of analyzing and characterizing video. The VideoBook is able to search and retrieve video sequences with 92% accuracy in real-time. Experiments thus demonstrate that the characterization is capable of extracting higher level structure from raw pixel values.

  13. A high performance, ad-hoc, fuzzy query processing system for relational databases

    NASA Technical Reports Server (NTRS)

    Mansfield, William H., Jr.; Fleischman, Robert M.

    1992-01-01

    Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.

  14. Intelligent search in Big Data

    NASA Astrophysics Data System (ADS)

    Birialtsev, E.; Bukharaev, N.; Gusenkov, A.

    2017-10-01

    An approach to data integration, aimed on the ontology-based intelligent search in Big Data, is considered in the case when information objects are represented in the form of relational databases (RDB), structurally marked by their schemes. The source of information for constructing an ontology and, later on, the organization of the search are texts in natural language, treated as semi-structured data. For the RDBs, these are comments on the names of tables and their attributes. Formal definition of RDBs integration model in terms of ontologies is given. Within framework of the model universal RDB representation ontology, oil production subject domain ontology and linguistic thesaurus of subject domain language are built. Technique of automatic SQL queries generation for subject domain specialists is proposed. On the base of it, information system for TATNEFT oil-producing company RDBs was implemented. Exploitation of the system showed good relevance with majority of queries.

  15. Information Network Model Query Processing

    NASA Astrophysics Data System (ADS)

    Song, Xiaopu

    Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.

  16. Improving Concept-Based Web Image Retrieval by Mixing Semantically Similar Greek Queries

    ERIC Educational Resources Information Center

    Lazarinis, Fotis

    2008-01-01

    Purpose: Image searching is a common activity for web users. Search engines offer image retrieval services based on textual queries. Previous studies have shown that web searching is more demanding when the search is not in English and does not use a Latin-based language. The aim of this paper is to explore the behaviour of the major search…

  17. Which factors predict the time spent answering queries to a drug information centre?

    PubMed Central

    Reppe, Linda A.; Spigset, Olav

    2010-01-01

    Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480

  18. Mining Longitudinal Web Queries: Trends and Patterns.

    ERIC Educational Resources Information Center

    Wang, Peiling; Berry, Michael W.; Yang, Yiheng

    2003-01-01

    Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…

  19. Web queries as a source for syndromic surveillance.

    PubMed

    Hulth, Anette; Rydevik, Gustaf; Linde, Annika

    2009-01-01

    In the field of syndromic surveillance, various sources are exploited for outbreak detection, monitoring and prediction. This paper describes a study on queries submitted to a medical web site, with influenza as a case study. The hypothesis of the work was that queries on influenza and influenza-like illness would provide a basis for the estimation of the timing of the peak and the intensity of the yearly influenza outbreaks that would be as good as the existing laboratory and sentinel surveillance. We calculated the occurrence of various queries related to influenza from search logs submitted to a Swedish medical web site for two influenza seasons. These figures were subsequently used to generate two models, one to estimate the number of laboratory verified influenza cases and one to estimate the proportion of patients with influenza-like illness reported by selected General Practitioners in Sweden. We applied an approach designed for highly correlated data, partial least squares regression. In our work, we found that certain web queries on influenza follow the same pattern as that obtained by the two other surveillance systems for influenza epidemics, and that they have equal power for the estimation of the influenza burden in society. Web queries give a unique access to ill individuals who are not (yet) seeking care. This paper shows the potential of web queries as an accurate, cheap and labour extensive source for syndromic surveillance.

  20. Representation and alignment of sung queries for music information retrieval

    NASA Astrophysics Data System (ADS)

    Adams, Norman H.; Wakefield, Gregory H.

    2005-09-01

    The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.

  1. A Method for Search Engine Selection using Thesaurus for Selective Meta-Search Engine

    NASA Astrophysics Data System (ADS)

    Goto, Shoji; Ozono, Tadachika; Shintani, Toramatsu

    In this paper, we propose a new method for selecting search engines on WWW for selective meta-search engine. In selective meta-search engine, a method is needed that would enable selecting appropriate search engines for users' queries. Most existing methods use statistical data such as document frequency. These methods may select inappropriate search engines if a query contains polysemous words. In this paper, we describe an search engine selection method based on thesaurus. In our method, a thesaurus is constructed from documents in a search engine and is used as a source description of the search engine. The form of a particular thesaurus depends on the documents used for its construction. Our method enables search engine selection by considering relationship between terms and overcomes the problems caused by polysemous words. Further, our method does not have a centralized broker maintaining data, such as document frequency for all search engines. As a result, it is easy to add a new search engine, and meta-search engines become more scalable with our method compared to other existing methods.

  2. Consistent Query Answering of Conjunctive Queries under Primary Key Constraints

    ERIC Educational Resources Information Center

    Pema, Enela

    2014-01-01

    An inconsistent database is a database that violates one or more of its integrity constraints. In reality, violations of integrity constraints arise frequently under several different circumstances. Inconsistent databases have long posed the challenge to develop suitable tools for meaningful query answering. A principled approach for querying…

  3. CrossQuery: a web tool for easy associative querying of transcriptome data.

    PubMed

    Wagner, Toni U; Fischer, Andreas; Thoma, Eva C; Schartl, Manfred

    2011-01-01

    Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.

  4. Web Searching: A Process-Oriented Experimental Study of Three Interactive Search Paradigms.

    ERIC Educational Resources Information Center

    Dennis, Simon; Bruza, Peter; McArthur, Robert

    2002-01-01

    Compares search effectiveness when using query-based Internet search via the Google search engine, directory-based search via Yahoo, and phrase-based query reformulation-assisted search via the Hyperindex browser by means of a controlled, user-based experimental study of undergraduates at the University of Queensland. Discusses cognitive load,…

  5. The effect of search term on the quality and accuracy of online information regarding distal radius fractures.

    PubMed

    Dy, Christopher J; Taylor, Samuel A; Patel, Ronak M; Kitay, Alison; Roberts, Timothy R; Daluiski, Aaron

    2012-09-01

    Recent emphasis on shared decision making and patient-centered research has increased the importance of patient education and health literacy. The internet is rapidly growing as a source of self-education for patients. However, concern exists over the quality, accuracy, and readability of the information. Our objective was to determine whether the quality, accuracy, and readability of information online about distal radius fractures vary with the search term. This was a prospective evaluation of 3 search engines using 3 different search terms of varying sophistication ("distal radius fracture," "wrist fracture," and "broken wrist"). We evaluated 70 unique Web sites for quality, accuracy, and readability. We used comparative statistics to determine whether the search term affected the quality, accuracy, and readability of the Web sites found. Three orthopedic surgeons independently gauged quality and accuracy of information using a set of predetermined scoring criteria. We evaluated the readability of the Web site using the Fleisch-Kincaid score for reading grade level. There were significant differences in the quality, accuracy, and readability of information found, depending on the search term. We found higher quality and accuracy resulted from the search term "distal radius fracture," particularly compared with Web sites resulting from the term "broken wrist." The reading level was higher than recommended in 65 of the 70 Web sites and was significantly higher when searching with "distal radius fracture" than "wrist fracture" or "broken wrist." There was no correlation between Web site reading level and quality or accuracy. The readability of information about distal radius fractures in most Web sites was higher than the recommended reading level for the general public. The quality and accuracy of the information found significantly varied with the sophistication of the search term used. Physicians, professional societies, and search engines should consider

  6. Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.

    PubMed

    Khennak, Ilyes; Drias, Habiba

    2017-02-01

    With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.

  7. The Database Query Support Processor (QSP)

    NASA Technical Reports Server (NTRS)

    1993-01-01

    The number and diversity of databases available to users continues to increase dramatically. Currently, the trend is towards decentralized, client server architectures that (on the surface) are less expensive to acquire, operate, and maintain than information architectures based on centralized, monolithic mainframes. The database query support processor (QSP) effort evaluates the performance of a network level, heterogeneous database access capability. Air Force Material Command's Rome Laboratory has developed an approach, based on ANSI standard X3.138 - 1988, 'The Information Resource Dictionary System (IRDS)' to seamless access to heterogeneous databases based on extensions to data dictionary technology. To successfully query a decentralized information system, users must know what data are available from which source, or have the knowledge and system privileges necessary to find out this information. Privacy and security considerations prohibit free and open access to every information system in every network. Even in completely open systems, time required to locate relevant data (in systems of any appreciable size) would be better spent analyzing the data, assuming the original question was not forgotten. Extensions to data dictionary technology have the potential to more fully automate the search and retrieval for relevant data in a decentralized environment. Substantial amounts of time and money could be saved by not having to teach users what data resides in which systems and how to access each of those systems. Information describing data and how to get it could be removed from the application and placed in a dedicated repository where it belongs. The result simplified applications that are less brittle and less expensive to build and maintain. Software technology providing the required functionality is off the shelf. The key difficulty is in defining the metadata required to support the process. The database query support processor effort will provide

  8. Automatically Preparing Safe SQL Queries

    NASA Astrophysics Data System (ADS)

    Bisht, Prithvi; Sistla, A. Prasad; Venkatakrishnan, V. N.

    We present the first sound program source transformation approach for automatically transforming the code of a legacy web application to employ PREPARE statements in place of unsafe SQL queries. Our approach therefore opens the way for eradicating the SQL injection threat vector from legacy web applications.

  9. Querying Large Biological Network Datasets

    ERIC Educational Resources Information Center

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  10. Demystifying the Search Button

    PubMed Central

    McKeever, Liam; Nguyen, Van; Peterson, Sarah J.; Gomez-Perez, Sandra

    2015-01-01

    A thorough review of the literature is the basis of all research and evidence-based practice. A gold-standard efficient and exhaustive search strategy is needed to ensure all relevant citations have been captured and that the search performed is reproducible. The PubMed database comprises both the MEDLINE and non-MEDLINE databases. MEDLINE-based search strategies are robust but capture only 89% of the total available citations in PubMed. The remaining 11% include the most recent and possibly relevant citations but are only searchable through less efficient techniques. An effective search strategy must employ both the MEDLINE and the non-MEDLINE portion of PubMed to ensure all studies have been identified. The robust MEDLINE search strategies are used for the MEDLINE portion of the search. Usage of the less robust strategies is then efficiently confined to search only the remaining 11% of PubMed citations that have not been indexed for MEDLINE. The current article offers step-by-step instructions for building such a search exploring methods for the discovery of medical subject heading (MeSH) terms to search MEDLINE, text-based methods for exploring the non-MEDLINE database, information on the limitations of convenience algorithms such as the “related citations feature,” the strengths and pitfalls associated with commonly used filters, the proper usage of Boolean operators to organize a master search strategy, and instructions for automating that search through “MyNCBI” to receive search query updates by email as new citations become available. PMID:26129895

  11. Query-based learning for aerospace applications.

    PubMed

    Saad, E W; Choi, J J; Vian, J L; Wunsch, D C Ii

    2003-01-01

    Models of real-world applications often include a large number of parameters with a wide dynamic range, which contributes to the difficulties of neural network training. Creating the training data set for such applications becomes costly, if not impossible. In order to overcome the challenge, one can employ an active learning technique known as query-based learning (QBL) to add performance-critical data to the training set during the learning phase, thereby efficiently improving the overall learning/generalization. The performance-critical data can be obtained using an inverse mapping called network inversion (discrete network inversion and continuous network inversion) followed by oracle query. This paper investigates the use of both inversion techniques for QBL learning, and introduces an original heuristic to select the inversion target values for continuous network inversion method. Efficiency and generalization was further enhanced by employing node decoupled extended Kalman filter (NDEKF) training and a causality index (CI) as a means to reduce the input search dimensionality. The benefits of the overall QBL approach are experimentally demonstrated in two aerospace applications: a classification problem with large input space and a control distribution problem.

  12. Querying temporal clinical databases on granular trends.

    PubMed

    Combi, Carlo; Pozzi, Giuseppe; Rossato, Rosalba

    2012-04-01

    This paper focuses on the identification of temporal trends involving different granularities in clinical databases, where data are temporal in nature: for example, while follow-up visit data are usually stored at the granularity of working days, queries on these data could require to consider trends either at the granularity of months ("find patients who had an increase of systolic blood pressure within a single month") or at the granularity of weeks ("find patients who had steady states of diastolic blood pressure for more than 3 weeks"). Representing and reasoning properly on temporal clinical data at different granularities are important both to guarantee the efficacy and the quality of care processes and to detect emergency situations. Temporal sequences of data acquired during a care process provide a significant source of information not only to search for a particular value or an event at a specific time, but also to detect some clinically-relevant patterns for temporal data. We propose a general framework for the description and management of temporal trends by considering specific temporal features with respect to the chosen time granularity. Temporal aspects of data are considered within temporal relational databases, first formally by using a temporal extension of the relational calculus, and then by showing how to map these relational expressions to plain SQL queries. Throughout the paper we consider the clinical domain of hemodialysis, where several parameters are periodically sampled during every session. Copyright © 2011 Elsevier Inc. All rights reserved.

  13. Persistent Identifiers for Improved Accessibility for Linked Data Querying

    NASA Astrophysics Data System (ADS)

    Shepherd, A.; Chandler, C. L.; Arko, R. A.; Fils, D.; Jones, M. B.; Krisnadhi, A.; Mecum, B.

    2016-12-01

    The adoption of linked open data principles within the geosciences has increased the amount of accessible information available on the Web. However, this data is difficult to consume for those who are unfamiliar with Semantic Web technologies such as Web Ontology Language (OWL), Resource Description Framework (RDF) and SPARQL - the RDF query language. Consumers would need to understand the structure of the data and how to efficiently query it. Furthermore, understanding how to query doesn't solve problems of poor precision and recall in search results. For consumers unfamiliar with the data, full-text searches are most accessible, but not ideal as they arrest the advantages of data disambiguation and co-reference resolution efforts. Conversely, URI searches across linked data can deliver improved search results, but knowledge of these exact URIs may remain difficult to obtain. The increased adoption of Persistent Identifiers (PIDs) can lead to improved linked data querying by a wide variety of consumers. Because PIDs resolve to a single entity, they are an excellent data point for disambiguating content. At the same time, PIDs are more accessible and prominent than a single data provider's linked data URI. When present in linked open datasets, PIDs provide balance between the technical and social hurdles of linked data querying as evidenced by the NSF EarthCube GeoLink project. The GeoLink project, funded by NSF's EarthCube initiative, have brought together data repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span scientific studies from marine geology to marine ecosystems and biogeochemistry to paleoclimatology.

  14. Implementation of the common phrase index method on the phrase query for information retrieval

    NASA Astrophysics Data System (ADS)

    Fatmawati, Triyah; Zaman, Badrus; Werdiningsih, Indah

    2017-08-01

    As the development of technology, the process of finding information on the news text is easy, because the text of the news is not only distributed in print media, such as newspapers, but also in electronic media that can be accessed using the search engine. In the process of finding relevant documents on the search engine, a phrase often used as a query. The number of words that make up the phrase query and their position obviously affect the relevance of the document produced. As a result, the accuracy of the information obtained will be affected. Based on the outlined problem, the purpose of this research was to analyze the implementation of the common phrase index method on information retrieval. This research will be conducted in English news text and implemented on a prototype to determine the relevance level of the documents produced. The system is built with the stages of pre-processing, indexing, term weighting calculation, and cosine similarity calculation. Then the system will display the document search results in a sequence, based on the cosine similarity. Furthermore, system testing will be conducted using 100 documents and 20 queries. That result is then used for the evaluation stage. First, determine the relevant documents using kappa statistic calculation. Second, determine the system success rate using precision, recall, and F-measure calculation. In this research, the result of kappa statistic calculation was 0.71, so that the relevant documents are eligible for the system evaluation. Then the calculation of precision, recall, and F-measure produces precision of 0.37, recall of 0.50, and F-measure of 0.43. From this result can be said that the success rate of the system to produce relevant documents is low.

  15. Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal

    PubMed Central

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen

    2014-01-01

    Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for

  16. Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal.

    PubMed

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman

    2014-07-04

    The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for

  17. Generating and Executing Complex Natural Language Queries across Linked Data.

    PubMed

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  18. A Systematic Search for Short-term Variability of EGRET Sources

    NASA Technical Reports Server (NTRS)

    Wallace, P. M.; Griffis, N. J.; Bertsch, D. L.; Hartman, R. C.; Thompson, D. J.; Kniffen, D. A.; Bloom, S. D.

    2000-01-01

    The 3rd EGRET Catalog of High-energy Gamma-ray Sources contains 170 unidentified sources, and there is great interest in the nature of these sources. One means of determining source class is the study of flux variability on time scales of days; pulsars are believed to be stable on these time scales while blazers are known to be highly variable. In addition, previous work has demonstrated that 3EG J0241-6103 and 3EG J1837-0606 are candidates for a new gamma-ray source class. These sources near the Galactic plane display transient behavior but cannot be associated with any known blazers. Although, many instances of flaring AGN have been reported, the EGRET database has not been systematically searched for occurrences of short-timescale (approximately 1 day) variability. These considerations have led us to conduct a systematic search for short-term variability in EGRET data, covering all viewing periods through proposal cycle 4. Six 3EG catalog sources are reported here to display variability on short time scales; four of them are unidentified. In addition, three non-catalog variable sources are discussed.

  19. Query Language for Location-Based Services: A Model Checking Approach

    NASA Astrophysics Data System (ADS)

    Hoareau, Christian; Satoh, Ichiro

    We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.

  20. Query Optimization by Semantic Reasoning.

    DTIC Science & Technology

    1981-05-01

    condition holds, then formulas X and Y are said to be ,nerge-compatible. Let xi be the variable in X that corresponds to variable yj in Y (x is not...Davidson, Ramez EI-Masri, Sheldon Finkelstein, Hector Garcia, Mohammed Olumi, Tom Rogers, Neil Rowe, David Shaw, and Kyu-Young Whang . Special credit...for the simple queries, along with cost formulas and applicability conditions for the methods. Most recently has come the development of optimizers for

  1. Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.

    PubMed

    De-Arteaga, Maria; Eggel, Ivan; Do, Bao; Rubin, Daniel; Kahn, Charles E; Müller, Henning

    2015-08-01

    Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the way in which physicians aim to access information. Medical image search is a much smaller domain but has gained much attention as it has different characteristics than search for text documents. While web search log files have been analysed many times to better understand user behaviour, the log files of hospital internal systems for search in a PACS/RIS (Picture Archival and Communication System, Radiology Information System) have rarely been analysed. Such a comparison between a hospital PACS/RIS search and a web system for searching images of the biomedical literature is the goal of this paper. Objectives are to identify similarities and differences in search behaviour of the two systems, which could then be used to optimize existing systems and build new search engines. Log files of the ARRS GoldMiner medical image search engine (freely accessible on the Internet) containing 222,005 queries, and log files of Stanford's internal PACS/RIS search called radTF containing 18,068 queries were analysed. Each query was preprocessed and all query terms were mapped to the RadLex (Radiology Lexicon) terminology, a comprehensive lexicon of radiology terms created and maintained by the Radiological Society of North America, so the semantic content in the queries and the links between terms could be analysed, and synonyms for the same concept could be detected. RadLex was mainly created for the use in radiology reports, to aid structured reporting and the preparation of educational material (Lanlotz, 2006) [1]. In standard medical vocabularies such as MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) specific terms of radiology are often

  2. Ad-Hoc Queries over Document Collections - A Case Study

    NASA Astrophysics Data System (ADS)

    Löser, Alexander; Lutter, Steffen; Düssel, Patrick; Markl, Volker

    We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.

  3. Monitoring Moving Queries inside a Safe Region

    PubMed Central

    Al-Khalidi, Haidar; Taniar, David; Alamri, Sultan

    2014-01-01

    With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns. PMID:24696652

  4. Advanced Query Formulation in Deductive Databases.

    ERIC Educational Resources Information Center

    Niemi, Timo; Jarvelin, Kalervo

    1992-01-01

    Discusses deductive databases and database management systems (DBMS) and introduces a framework for advanced query formulation for end users. Recursive processing is described, a sample extensional database is presented, query types are explained, and criteria for advanced query formulation from the end user's viewpoint are examined. (31…

  5. Google search behavior for status epilepticus.

    PubMed

    Brigo, Francesco; Trinka, Eugen

    2015-08-01

    Millions of people surf the Internet every day as a source of health-care information looking for materials about symptoms, diagnosis, treatments and their possible adverse effects, or diagnostic procedures. Google is the most popular search engine and is used by patients and physicians to search for online health-related information. This study aimed to evaluate changes in Google search behavior occurring in English-speaking countries over time for the term "status epilepticus" (SE). Using Google Trends, data on global search queries for the term SE between the 1st of January 2004 and 31st of December 2014 were analyzed. Search volume numbers over time (downloaded as CSV datasets) were analyzed by applying the "health" category filter. The research trends for the term SE remained fairly constant over time. The greatest search volume for the term SE was reported in the United States, followed by India, Australia, the United Kingdom, Canada, the Netherlands, Thailand, and Germany. Most terms associated with the search queries were related to SE definition, symptoms, subtypes, and treatment. The volume of searches for some queries (nonconvulsive, focal, and refractory SE; SE definition; SE guidelines; SE symptoms; SE management; SE treatment) was enormously increased over time (search popularity has exceeded a 5000% growth since 2004). Most people use search engines to look for the term SE to obtain information on its definition, subtypes, and management. The greatest search volume occurred not only in developed countries but also in developing countries where raising awareness about SE still remains a challenging task and where there is reduced public knowledge of epilepsy. Health information seeking (the extent to which people search for health information online) reflects the health-related information needs of Internet users for a specific disease. Google Trends shows that Internet users have a great demand for information concerning some aspects of SE

  6. A Framework for WWW Query Processing

    NASA Technical Reports Server (NTRS)

    Wu, Binghui Helen; Wharton, Stephen (Technical Monitor)

    2000-01-01

    Query processing is the most common operation in a DBMS. Sophisticated query processing has been mainly targeted at a single enterprise environment providing centralized control over data and metadata. Submitting queries by anonymous users on the web is different in such a way that load balancing or DBMS' accessing control becomes the key issue. This paper provides a solution by introducing a framework for WWW query processing. The success of this framework lies in the utilization of query optimization techniques and the ontological approach. This methodology has proved to be cost effective at the NASA Goddard Space Flight Center Distributed Active Archive Center (GDAAC).

  7. An assessment of the visibility of MeSH-indexed medical web catalogs through search engines.

    PubMed

    Zweigenbaum, P; Darmoni, S J; Grabar, N; Douyère, M; Benichou, J

    2002-01-01

    Manually indexed Internet health catalogs such as CliniWeb or CISMeF provide resources for retrieving high-quality health information. Users of these quality-controlled subject gateways are most often referred to them by general search engines such as Google, AltaVista, etc. This raises several questions, among which the following: what is the relative visibility of medical Internet catalogs through search engines? This study addresses this issue by measuring and comparing the visibility of six major, MeSH-indexed health catalogs through four different search engines (AltaVista, Google, Lycos, Northern Light) in two languages (English and French). Over half a million queries were sent to the search engines; for most of these search engines, according to our measures at the time the queries were sent, the most visible catalog for English MeSH terms was CliniWeb and the most visible one for French MeSH terms was CISMeF.

  8. Semantic Annotations and Querying of Web Data Sources

    NASA Astrophysics Data System (ADS)

    Hornung, Thomas; May, Wolfgang

    A large part of the Web, actually holding a significant portion of the useful information throughout the Web, consists of views on hidden databases, provided by numerous heterogeneous interfaces that are partly human-oriented via Web forms ("Deep Web"), and partly based on Web Services (only machine accessible). In this paper we present an approach for annotating these sources in a way that makes them citizens of the Semantic Web. We illustrate how queries can be stated in terms of the ontology, and how the annotations are used to selected and access appropriate sources and to answer the queries.

  9. Analysis of PubMed User Sessions Using a Full-Day PubMed Query Log: A Comparison of Experienced and Nonexperienced PubMed Users

    PubMed Central

    2015-01-01

    Background PubMed is the largest biomedical bibliographic information source on the Internet. PubMed has been considered one of the most important and reliable sources of up-to-date health care evidence. Previous studies examined the effects of domain expertise/knowledge on search performance using PubMed. However, very little is known about PubMed users’ knowledge of information retrieval (IR) functions and their usage in query formulation. Objective The purpose of this study was to shed light on how experienced/nonexperienced PubMed users perform their search queries by analyzing a full-day query log. Our hypotheses were that (1) experienced PubMed users who use system functions quickly retrieve relevant documents and (2) nonexperienced PubMed users who do not use them have longer search sessions than experienced users. Methods To test these hypotheses, we analyzed PubMed query log data containing nearly 3 million queries. User sessions were divided into two categories: experienced and nonexperienced. We compared experienced and nonexperienced users per number of sessions, and experienced and nonexperienced user sessions per session length, with a focus on how fast they completed their sessions. Results To test our hypotheses, we measured how successful information retrieval was (at retrieving relevant documents), represented as the decrease rates of experienced and nonexperienced users from a session length of 1 to 2, 3, 4, and 5. The decrease rate (from a session length of 1 to 2) of the experienced users was significantly larger than that of the nonexperienced groups. Conclusions Experienced PubMed users retrieve relevant documents more quickly than nonexperienced PubMed users in terms of session length. PMID:26139516

  10. Evolutionary Multiobjective Query Workload Optimization of Cloud Data Warehouses

    PubMed Central

    Dokeroglu, Tansel; Sert, Seyyit Alper; Cinar, Muhammet Serkan

    2014-01-01

    With the advent of Cloud databases, query optimizers need to find paretooptimal solutions in terms of response time and monetary cost. Our novel approach minimizes both objectives by deploying alternative virtual resources and query plans making use of the virtual resource elasticity of the Cloud. We propose an exact multiobjective branch-and-bound and a robust multiobjective genetic algorithm for the optimization of distributed data warehouse query workloads on the Cloud. In order to investigate the effectiveness of our approach, we incorporate the devised algorithms into a prototype system. Finally, through several experiments that we have conducted with different workloads and virtual resource configurations, we conclude remarkable findings of alternative deployments as well as the advantages and disadvantages of the multiobjective algorithms we propose. PMID:24892048

  11. Crowd-sourced Ontology for Photoleukocoria: Identifying Common Internet Search Terms for a Potentially Important Pediatric Ophthalmic Sign.

    PubMed

    Staffieri, Sandra E; Kearns, Lisa S; Sanfilippo, Paul G; Craig, Jamie E; Mackey, David A; Hewitt, Alex W

    2018-02-01

    Leukocoria is the most common presenting sign for pediatric eye disease including retinoblastoma and cataract, with worse outcomes if diagnosis is delayed. We investigated whether individuals could identify leukocoria in photographs (photoleukocoria) and examined their subsequent Internet search behavior. Using a web-based questionnaire, in this cross-sectional study we invited adults aged over 18 years to view two photographs of a child with photoleukocoria, and then search the Internet to determine a possible diagnosis and action plan. The most commonly used search terms and websites accessed were recorded. The questionnaire was completed by 1639 individuals. Facebook advertisement was the most effective recruitment strategy. The mean age of all respondents was 38.95 ± 14.59 years (range, 18-83), 94% were female, and 59.3% had children. An abnormality in the images presented was identified by 1613 (98.4%) participants. The most commonly used search terms were: "white," "pupil," "photo," and "eye" reaching a variety of appropriate websites or links to print or social media articles. Different words or phrases were used to describe the same observation of photoleukocoria leading to a range of websites. Variations in the description of observed signs and search words influenced the sites reached, information obtained, and subsequent help-seeking intentions. Identifying the most commonly used search terms for photoleukocoria is an important step for search engine optimization. Being directed to the most appropriate websites informing of the significance of photoleukocoria and the appropriate actions to take could improve delays in diagnosis of important pediatric eye disease such as retinoblastoma or cataract.

  12. Query Results Clustering by Extending SPARQL with CLUSTER BY

    NASA Astrophysics Data System (ADS)

    Ławrynowicz, Agnieszka

    The task of dynamic clustering of the search results proved to be useful in the Web context, where the user often does not know the granularity of the search results in advance. The goal of this paper is to provide a declarative way for invoking dynamic clustering of the results of queries submitted over Semantic Web data. To achieve this goal the paper proposes an approach that extends SPARQL by clustering abilities. The approach introduces a new statement, CLUSTER BY, into the SPARQL grammar and proposes semantics for such extension.

  13. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments

    PubMed Central

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-01-01

    A content-matched (CM) range monitoring query over moving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CM range monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods. PMID:26393613

  14. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments.

    PubMed

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-09-18

    A content-matched (CM) rangemonitoring query overmoving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CMrange monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods.

  15. Clean Air Markets - Facility Attributes and Contacts Query Wizard

    EPA Pesticide Factsheets

    The Facility Attributes and Contacts Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://camddataandmaps.epa.gov/gdm/index.cfm. The Facility Attributes and Contact module gives the user access to current and historical facility, owner, and representative data using custom queries, via the Facility Attributes Query Wizard, or Quick Reports. In addition, data regarding EPA, State, and local agency staff are also available. The Query Wizard can be used to search for data about a facility or facilities by identifying characteristics such as associated programs, owners, representatives, locations, and unit characteristics, facility inventories, and classifications.EPA's Clean Air Markets Division (CAMD) includes several market-based regulatory programs designed to improve air quality and ecosystems. The most well-known of these programs are EPA's Acid Rain Program and the NOx Programs, which reduce emissions of sulfur dioxide (SO2) and nitrogen oxides (NOx)-compounds that adversely affect air quality, the environment, and public health. CAMD also plays an integral role in the development and implementation of the Clean Air Interstate Rule (CAIR).

  16. Querying Semi-Structured Data

    NASA Technical Reports Server (NTRS)

    Abiteboul, Serge

    1997-01-01

    The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in the systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specic interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of data- formats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semi- structured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data.

  17. Classification of Automated Search Traffic

    NASA Astrophysics Data System (ADS)

    Buehrer, Greg; Stokes, Jack W.; Chellapilla, Kumar; Platt, John C.

    As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.

  18. Multi-Bit Quantum Private Query

    NASA Astrophysics Data System (ADS)

    Shi, Wei-Xu; Liu, Xing-Tong; Wang, Jian; Tang, Chao-Jing

    2015-09-01

    Most of the existing Quantum Private Queries (QPQ) protocols provide only single-bit queries service, thus have to be repeated several times when more bits are retrieved. Wei et al.'s scheme for block queries requires a high-dimension quantum key distribution system to sustain, which is still restricted in the laboratory. Here, based on Markus Jakobi et al.'s single-bit QPQ protocol, we propose a multi-bit quantum private query protocol, in which the user can get access to several bits within one single query. We also extend the proposed protocol to block queries, using a binary matrix to guard database security. Analysis in this paper shows that our protocol has better communication complexity, implementability and can achieve a considerable level of security.

  19. Short-term absence from industry: III The inference of `proneness' and a search for causes

    PubMed Central

    Froggatt, P.

    1970-01-01

    Froggatt, P. (1970).Brit. J. industr. Med.,27, 297-312. Short-term absence from industry. III. The inference of `proneness' and a search for causes. The abilities of five hypotheses (`chance', `proneness', and three of `true contagion' - as defined in the text) to explain the distributions of one-day and two-day absences among groups of male and female industrial personnel and clerks in government service are examined by curve-fitting and correlation methods. The five hypotheses generate (in order) the Poisson, negative binomial, Neyman type A, Short, and Hermite (two-parameter form) distributions which are fitted to the data using maximum-likelihood estimates. The conclusion is drawn that `proneness', i.e., a stable `liability', compounded from several though unquantifiable factors, and constant for each individual over the period of the study, is markedly successful in explaining the data. It is emphasized that some of the other hypotheses under test cannot be unequivocably rejected; and there is in theory an infinite number, still unformulated or untested, which may be acceptable or even fit the data better. Correlation coefficients for the numbers of one-day (and two-day) absences taken by the same individuals in two equal non-overlapping periods of time are of the order 0·5 to 0·7 (0·3 to 0·5 for two-day absences) and the corresponding regressions fulfil linear requirements. These correlations are higher than any between `personal characteristics' and their overt consequence in contingent fields of human enquiry. For one-day absences the predictive power for the future from the past record could in some circumstances justify executive action. When freely available, overtime was greatest among junior married men and least among junior married women. The validity of the inference of `proneness' and the implications of its acceptance are fully discussed. While interpretation is not unequivocal, one-day absences seemingly have many causes; two-day absences are

  20. Where to search top-K biomedical ontologies?

    PubMed

    Oliveira, Daniela; Butt, Anila Sahar; Haller, Armin; Rebholz-Schuhmann, Dietrich; Sahay, Ratnesh

    2018-03-20

    Searching for precise terms and terminological definitions in the biomedical data space is problematic, as researchers find overlapping, closely related and even equivalent concepts in a single or multiple ontologies. Search engines that retrieve ontological resources often suggest an extensive list of search results for a given input term, which leads to the tedious task of selecting the best-fit ontological resource (class or property) for the input term and reduces user confidence in the retrieval engines. A systematic evaluation of these search engines is necessary to understand their strengths and weaknesses in different search requirements. We have implemented seven comparable Information Retrieval ranking algorithms to search through ontologies and compared them against four search engines for ontologies. Free-text queries have been performed, the outcomes have been judged by experts and the ranking algorithms and search engines have been evaluated against the expert-based ground truth (GT). In addition, we propose a probabilistic GT that is developed automatically to provide deeper insights and confidence to the expert-based GT as well as evaluating a broader range of search queries. The main outcome of this work is the identification of key search factors for biomedical ontologies together with search requirements and a set of recommendations that will help biomedical experts and ontology engineers to select the best-suited retrieval mechanism in their search scenarios. We expect that this evaluation will allow researchers and practitioners to apply the current search techniques more reliably and that it will help them to select the right solution for their daily work. The source code (of seven ranking algorithms), ground truths and experimental results are available at https://github.com/danielapoliveira/bioont-search-benchmark.

  1. Variability of patient spine education by Internet search engine.

    PubMed

    Ghobrial, George M; Mehdi, Angud; Maltenfort, Mitchell; Sharan, Ashwini D; Harrop, James S

    2014-03-01

    Patients are increasingly reliant upon the Internet as a primary source of medical information. The educational experience varies by search engine, search term, and changes daily. There are no tools for critical evaluation of spinal surgery websites. To highlight the variability between common search engines for the same search terms. To detect bias, by prevalence of specific kinds of websites for certain spinal disorders. Demonstrate a simple scoring system of spinal disorder website for patient use, to maximize the quality of information exposed to the patient. Ten common search terms were used to query three of the most common search engines. The top fifty results of each query were tabulated. A negative binomial regression was performed to highlight the variation across each search engine. Google was more likely than Bing and Yahoo search engines to return hospital ads (P=0.002) and more likely to return scholarly sites of peer-reviewed lite (P=0.003). Educational web sites, surgical group sites, and online web communities had a significantly higher likelihood of returning on any search, regardless of search engine, or search string (P=0.007). Likewise, professional websites, including hospital run, industry sponsored, legal, and peer-reviewed web pages were less likely to be found on a search overall, regardless of engine and search string (P=0.078). The Internet is a rapidly growing body of medical information which can serve as a useful tool for patient education. High quality information is readily available, provided that the patient uses a consistent, focused metric for evaluating online spine surgery information, as there is a clear variability in the way search engines present information to the patient. Published by Elsevier B.V.

  2. Spatial aggregation query in dynamic geosensor networks

    NASA Astrophysics Data System (ADS)

    Yi, Baolin; Feng, Dayang; Xiao, Shisong; Zhao, Erdun

    2007-11-01

    Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In many of these applications, the researches mainly aim at building sensor network based systems to leverage the sensed data to applications. However, the existing works seldom exploited spatial aggregation query considering the dynamic characteristics of sensor networks. In this paper, we investigate how to process spatial aggregation query over dynamic geosensor networks where both the sink node and sensor nodes are mobile and propose several novel improvements on enabling techniques. The mobility of sensors makes the existing routing protocol based on information of fixed framework or the neighborhood infeasible. We present an improved location-based stateless implicit geographic forwarding (IGF) protocol for routing a query toward the area specified by query window, a diameter-based window aggregation query (DWAQ) algorithm for query propagation and data aggregation in the query window, finally considering the location changing of the sink node, we present two schemes to forward the result to the sink node. Simulation results show that the proposed algorithms can improve query latency and query accuracy.

  3. Secure count query on encrypted genomic data.

    PubMed

    Hasan, Mohammad Zahidul; Mahdi, Md Safiur Rahman; Sadat, Md Nazmus; Mohammed, Noman

    2018-05-01

    Human genomic information can yield more effective healthcare by guiding medical decisions. Therefore, genomics research is gaining popularity as it can identify potential correlations between a disease and a certain gene, which improves the safety and efficacy of drug treatment and can also develop more effective prevention strategies [1]. To reduce the sampling error and to increase the statistical accuracy of this type of research projects, data from different sources need to be brought together since a single organization does not necessarily possess required amount of data. In this case, data sharing among multiple organizations must satisfy strict policies (for instance, HIPAA and PIPEDA) that have been enforced to regulate privacy-sensitive data sharing. Storage and computation on the shared data can be outsourced to a third party cloud service provider, equipped with enormous storage and computation resources. However, outsourcing data to a third party is associated with a potential risk of privacy violation of the participants, whose genomic sequence or clinical profile is used in these studies. In this article, we propose a method for secure sharing and computation on genomic data in a semi-honest cloud server. In particular, there are two main contributions. Firstly, the proposed method can handle biomedical data containing both genotype and phenotype. Secondly, our proposed index tree scheme reduces the computational overhead significantly for executing secure count query operation. In our proposed method, the confidentiality of shared data is ensured through encryption, while making the entire computation process efficient and scalable for cutting-edge biomedical applications. We evaluated our proposed method in terms of efficiency on a database of Single-Nucleotide Polymorphism (SNP) sequences, and experimental results demonstrate that the execution time for a query of 50 SNPs in a database of 50,000 records is approximately 5 s, where each record

  4. Query-biased preview over outsourced and encrypted data.

    PubMed

    Peng, Ningduo; Luo, Guangchun; Qin, Ke; Chen, Aiguo

    2013-01-01

    For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length.

  5. Query-Biased Preview over Outsourced and Encrypted Data

    PubMed Central

    Luo, Guangchun; Qin, Ke; Chen, Aiguo

    2013-01-01

    For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length. PMID:24078798

  6. NEOview: Near Earth Object Data Discovery and Query

    NASA Astrophysics Data System (ADS)

    Tibbetts, M.; Elvis, M.; Galache, J. L.; Harbo, P.; McDowell, J. C.; Rudenko, M.; Van Stone, D.; Zografou, P.

    2013-10-01

    Missions to Near Earth Objects (NEOs) figure prominently in NASA's Flexible Path approach to human space exploration. NEOs offer insight into both the origins of the Solar System and of life, as well as a source of materials for future missions. With NEOview scientists can locate NEO datasets, explore metadata provided by the archives, and query or combine disparate NEO datasets in the search for NEO candidates for exploration. NEOview is a software system that illustrates how standards-based interfaces facilitate NEO data discovery and research. NEOview software follows a client-server architecture. The server is a configurable implementation of the International Virtual Observatory Alliance (IVOA) Table Access Protocol (TAP), a general interface for tabular data access, that can be deployed as a front end to existing NEO datasets. The TAP client, seleste, is a graphical interface that provides intuitive means of discovering NEO providers, exploring dataset metadata to identify fields of interest, and constructing queries to retrieve or combine data. It features a powerful, graphical query builder capable of easing the user's introduction to table searches. Through science use cases, NEOview demonstrates how potential targets for NEO rendezvous could be identified by combining data from complementary sources. Through deployment and operations, it has been shown that the software components are data independent and configurable to many different data servers. As such, NEOview's TAP server and seleste TAP client can be used to create a seamless environment for data discovery and exploration for tabular data in any astronomical archive.

  7. Public Awareness of Uterine Power Morcellation Through US Food and Drug Administration Communications: Analysis of Google Trends Search Term Patterns

    PubMed Central

    Jamnagerwalla, Juzar; Markowitz, Melissa A; Thum, D Joseph; McCarty, Philip; Medendorp, Andrew R; Raz, Shlomo; Kim, Ja-Hong

    2018-01-01

    Background Uterine power morcellation, where the uterus is shred into smaller pieces, is a widely used technique for removal of uterine specimens in patients undergoing minimally invasive abdominal hysterectomy or myomectomy. Complications related to power morcellation of uterine specimens led to US Food and Drug Administration (FDA) communications in 2014 ultimately recommending against the use of power morcellation for women undergoing minimally invasive hysterectomy. Subsequently, practitioners drastically decreased the use of morcellation. Objective We aimed to determine the effect of increased patient awareness on the decrease in use of the morcellator. Google Trends is a public tool that provides data on temporal patterns of search terms, and we correlated this data with the timing of the FDA communication. Methods Weekly relative search volume (RSV) was obtained from Google Trends using the term “morcellation.” Higher RSV corresponds to increases in weekly search volume. Search volumes were divided into 3 groups: the 2 years prior to the FDA communication, a 1-year period following, and thereafter, with the distribution of the weekly RSV over the 3 periods tested using 1-way analysis of variance. Additionally, we analyzed the total number of websites containing the term “morcellation” over this time. Results The mean RSV prior to the FDA communication was 12.0 (SD 15.8), with the RSV being 60.3 (SD 24.7) in the 1-year after and 19.3 (SD 5.2) thereafter (P<.001). The mean number of webpages containing the term “morcellation” in 2011 was 10,800, rising to 18,800 during 2014 and 36,200 in 2017. Conclusions Google search activity about morcellation of uterine specimens increased significantly after the FDA communications. This trend indicates an increased public awareness regarding morcellation and its complications. More extensive preoperative counseling and alteration of surgical technique and clinician practice may be necessary. PMID:29699965

  8. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.

    PubMed

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as 'CHEMICAL-1 compared to CHEMICAL-2' With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical-disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order

  9. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation

    PubMed Central

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked

  10. Abyss or Shelter? On the Relevance of Web Search Engines' Search Results When People Google for Suicide.

    PubMed

    Haim, Mario; Arendt, Florian; Scherr, Sebastian

    2017-02-01

    Despite evidence that suicide rates can increase after suicides are widely reported in the media, appropriate depictions of suicide in the media can help people to overcome suicidal crises and can thus elicit preventive effects. We argue on the level of individual media users that a similar ambivalence can be postulated for search results on online suicide-related search queries. Importantly, the filter bubble hypothesis (Pariser, 2011) states that search results are biased by algorithms based on a person's previous search behavior. In this study, we investigated whether suicide-related search queries, including either potentially suicide-preventive or -facilitative terms, influence subsequent search results. This might thus protect or harm suicidal Internet users. We utilized a 3 (search history: suicide-related harmful, suicide-related helpful, and suicide-unrelated) × 2 (reactive: clicking the top-most result link and no clicking) experimental design applying agent-based testing. While findings show no influences either of search histories or of reactivity on search results in a subsequent situation, the presentation of a helpline offer raises concerns about possible detrimental algorithmic decision-making: Algorithms "decided" whether or not to present a helpline, and this automated decision, then, followed the agent throughout the rest of the observation period. Implications for policy-making and search providers are discussed.

  11. Secure quantum private information retrieval using phase-encoded queries

    SciTech Connect

    Olejnik, Lukasz

    We propose a quantum solution to the classical private information retrieval (PIR) problem, which allows one to query a database in a private manner. The protocol offers privacy thresholds and allows the user to obtain information from a database in a way that offers the potential adversary, in this model the database owner, no possibility of deterministically establishing the query contents. This protocol may also be viewed as a solution to the symmetrically private information retrieval problem in that it can offer database security (inability for a querying user to steal its contents). Compared to classical solutions, the protocol offersmore » substantial improvement in terms of communication complexity. In comparison with the recent quantum private queries [Phys. Rev. Lett. 100, 230502 (2008)] protocol, it is more efficient in terms of communication complexity and the number of rounds, while offering a clear privacy parameter. We discuss the security of the protocol and analyze its strengths and conclude that using this technique makes it challenging to obtain the unconditional (in the information-theoretic sense) privacy degree; nevertheless, in addition to being simple, the protocol still offers a privacy level. The oracle used in the protocol is inspired both by the classical computational PIR solutions as well as the Deutsch-Jozsa oracle.« less

  12. Secure quantum private information retrieval using phase-encoded queries

    NASA Astrophysics Data System (ADS)

    Olejnik, Lukasz

    2011-08-01

    We propose a quantum solution to the classical private information retrieval (PIR) problem, which allows one to query a database in a private manner. The protocol offers privacy thresholds and allows the user to obtain information from a database in a way that offers the potential adversary, in this model the database owner, no possibility of deterministically establishing the query contents. This protocol may also be viewed as a solution to the symmetrically private information retrieval problem in that it can offer database security (inability for a querying user to steal its contents). Compared to classical solutions, the protocol offers substantial improvement in terms of communication complexity. In comparison with the recent quantum private queries [Phys. Rev. Lett.PRLTAO0031-900710.1103/PhysRevLett.100.230502 100, 230502 (2008)] protocol, it is more efficient in terms of communication complexity and the number of rounds, while offering a clear privacy parameter. We discuss the security of the protocol and analyze its strengths and conclude that using this technique makes it challenging to obtain the unconditional (in the information-theoretic sense) privacy degree; nevertheless, in addition to being simple, the protocol still offers a privacy level. The oracle used in the protocol is inspired both by the classical computational PIR solutions as well as the Deutsch-Jozsa oracle.

  13. Aggregating Queries Against Large Inventories of Remotely Accessible Data

    NASA Astrophysics Data System (ADS)

    Gallagher, J. H. R.; Fulker, D. W.

    2016-12-01

    Those seeking to discover data for a specific purpose often encounter search results that are so large as to be useless without computing assistance. This situation arises, with increasing frequency, in part because repositories contain ever greater numbers of granules, and their granularities may well be poorly aligned or even orthogonal to the data-selection needs of the user. This presentation describes a recently developed service for simultaneously querying large lists of OPeNDAP-accessible granules to extract specified data. The specifications include a richly expressive set of data-selection criteria—applicable to content as well as metadata—and the service has been tested successfully against lists naming hundreds of thousands of granules. Querying such numbers of local files (i.e., granules) on a desktop or laptop computer is practical (by using a scripting language, e.g.), but this practicality is diminished when the data are remote and thus best accessed through a Web-services interface. In these cases, which are increasingly common, scripted queries can take many hours because of inherent network latencies. Furthermore, communication dropouts can add fragility to such scripts, yielding gaps in the acquired results. In contrast, OPeNDAP's new aggregated-query services enable data discovery in the context of very large inventory sizes. These capabilities have been developed for use with OPeNDAP's Hyrax server, which is an open-source realization of DAP (for "Data Access Protocol," a specification widely used in NASA, NOAA and other data-intensive contexts). These aggregated-query services exhibit good response times (on the order of seconds, not hours) even for inventories that list hundreds of thousands of source granules.

  14. An exemplar-familiarity model predicts short-term and long-term probe recognition across diverse forms of memory search.

    PubMed

    Nosofsky, Robert M; Cox, Gregory E; Cao, Rui; Shiffrin, Richard M

    2014-11-01

    Experiments were conducted to test a modern exemplar-familiarity model on its ability to account for both short-term and long-term probe recognition within the same memory-search paradigm. Also, making connections to the literature on attention and visual search, the model was used to interpret differences in probe-recognition performance across diverse conditions that manipulated relations between targets and foils across trials. Subjects saw lists of from 1 to 16 items followed by a single item recognition probe. In a varied-mapping condition, targets and foils could switch roles across trials; in a consistent-mapping condition, targets and foils never switched roles; and in an all-new condition, on each trial a completely new set of items formed the memory set. In the varied-mapping and all-new conditions, mean correct response times (RTs) and error proportions were curvilinear increasing functions of memory set size, with the RT results closely resembling ones from hybrid visual-memory search experiments reported by Wolfe (2012). In the consistent-mapping condition, new-probe RTs were invariant with set size, whereas old-probe RTs increased slightly with increasing study-test lag. With appropriate choice of psychologically interpretable free parameters, the model accounted well for the complete set of results. The work provides support for the hypothesis that a common set of processes involving exemplar-based familiarity may govern long-term and short-term probe recognition across wide varieties of memory- search conditions. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  15. Fixing Dataset Search

    NASA Technical Reports Server (NTRS)

    Lynnes, Chris

    2014-01-01

    Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.

  16. Visual graph query formulation and exploration: a new perspective on information retrieval at the edge

    NASA Astrophysics Data System (ADS)

    Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng

    2016-05-01

    Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.

  17. Large scale study of multiple-molecule queries

    PubMed Central

    2009-01-01

    Background In ligand-based screening, as well as in other chemoinformatics applications, one seeks to effectively search large repositories of molecules in order to retrieve molecules that are similar typically to a single molecule lead. However, in some case, multiple molecules from the same family are available to seed the query and search for other members of the same family. Multiple-molecule query methods have been less studied than single-molecule query methods. Furthermore, the previous studies have relied on proprietary data and sometimes have not used proper cross-validation methods to assess the results. In contrast, here we develop and compare multiple-molecule query methods using several large publicly available data sets and background. We also create a framework based on a strict cross-validation protocol to allow unbiased benchmarking for direct comparison in future studies across several performance metrics. Results Fourteen different multiple-molecule query methods were defined and benchmarked using: (1) 41 publicly available data sets of related molecules with similar biological activity; and (2) publicly available background data sets consisting of up to 175,000 molecules randomly extracted from the ChemDB database and other sources. Eight of the fourteen methods were parameter free, and six of them fit one or two free parameters to the data using a careful cross-validation protocol. All the methods were assessed and compared for their ability to retrieve members of the same family against the background data set by using several performance metrics including the Area Under the Accumulation Curve (AUAC), Area Under the Curve (AUC), F1-measure, and BEDROC metrics. Consistent with the previous literature, the best parameter-free methods are the MAX-SIM and MIN-RANK methods, which score a molecule to a family by the maximum similarity, or minimum ranking, obtained across the family. One new parameterized method introduced in this study and two

  18. Flexible Querying of Lifelong Learner Metadata

    ERIC Educational Resources Information Center

    Poulovassilis, A.; Selmer, P.; Wood, P. T.

    2012-01-01

    This paper discusses the provision of flexible querying facilities over heterogeneous data arising from lifelong learners' educational and work experiences. A key aim of such querying facilities is to allow learners to identify possible choices for their future learning and professional development by seeing what others have done. We motivate and…

  19. Comparing the effectiveness of using generic and specific search terms in electronic databases to identify health outcomes for a systematic review: a prospective comparative study of literature search methods

    PubMed Central

    MacLean, Alice; Sweeting, Helen; Hunt, Kate

    2012-01-01

    Objective To compare the effectiveness of systematic review literature searches that use either generic or specific terms for health outcomes. Design Prospective comparative study of two electronic literature search strategies. The ‘generic’ search included general terms for health such as ‘adolescent health’, ‘health status’, ‘morbidity’, etc. The ‘specific’ search focused on terms for a range of specific illnesses, such as ‘headache’, ‘epilepsy’, ‘diabetes mellitus’, etc. Data sources The authors searched Medline, Embase, the Cumulative Index to Nursing and Allied Health Literature, PsycINFO and the Education Resources Information Center for studies published in English between 1992 and April 2010. Main outcome measures Number and proportion of studies included in the systematic review that were identified from each search. Results The two searches tended to identify different studies. Of 41 studies included in the final review, only three (7%) were identified by both search strategies, 21 (51%) were identified by the generic search only and 17 (41%) were identified by the specific search only. 5 of the 41 studies were also identified through manual searching methods. Studies identified by the two ELS differed in terms of reported health outcomes, while each ELS uniquely identified some of the review's higher quality studies. Conclusions Electronic literature searches (ELS) are a vital stage in conducting systematic reviews and therefore have an important role in attempts to inform and improve policy and practice with the best available evidence. While the use of both generic and specific health terms is conventional for many reviewers and information scientists, there are also reviews that rely solely on either generic or specific terms. Based on the findings, reliance on only the generic or specific approach could increase the risk of systematic reviews missing important evidence and, consequently, misinforming decision makers

  20. Federated ontology-based queries over cancer data

    PubMed Central

    2012-01-01

    Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user

  1. SEARCH: Study of Environmental Arctic Change--A System-scale, Cross-disciplinary, Long-term Arctic Research Program

    NASA Astrophysics Data System (ADS)

    Wiggins, H. V.; Schlosser, P.; Fox, S. E.

    2009-12-01

    The Study of Environmental Arctic Change (SEARCH) is a multi-agency effort to observe, understand, and guide responses to changes in the changing arctic system. Under the SEARCH program, guided by the Science Steering Committee (SSC), the Observing, Understanding, and Responding to Change panels, and the Interagency Program Management Committee (IPMC), scientists with a variety of expertise work together to achieve goals of the program. Over 150 projects and activities contribute to SEARCH implementation. The Observing Change component is underway through the NSF’s Arctic Observing Network (AON), NOAA-sponsored atmospheric and sea ice observations, and other relevant national and international efforts, including the EU-sponsored Developing Arctic Modeling and Observing Capabilities for Long-term Environmental Studies (DAMOCLES) Program. The Understanding Change component of SEARCH consists of modeling and analysis efforts, including the Sea Ice Outlook project, an international effort to provide a community-wide summary of the expected September arctic sea ice minimum. The Understanding Change component also has strong linkages to programs such as the NSF Arctic System Science (ARCSS) Program. The Responding to Change element will be launched through stakeholder-focused research and applications addressing social and economic concerns. As a national program under the International Study of Arctic Change (ISAC), SEARCH is working to expand international connections. The State of the Arctic Conference (soa.arcus.org), to be held 16-19 March 2010 in Miami, will be a milestone activity of SEARCH and will provide an international forum for discussion of future research directions aimed toward a better understanding of the arctic system and its trajectory. SEARCH is sponsored by eight U.S. agencies that comprise the IPMC, including: the National Science Foundation (NSF), the National Oceanic and Atmospheric Administration (NOAA), the National Aeronautics and Space

  2. Know your market: use of online query tools to quantify trends in patient information-seeking behavior for varicose vein treatment.

    PubMed

    Harsha, Asheesh K; Schmitt, J Eric; Stavropoulos, S William

    2014-01-01

    To analyze Internet search data to characterize the temporal and geographic interest of Internet users in the United States in varicose vein treatment. From January 1, 2004, to September 1, 2012, the Google Trends tool was used to analyze query data for "varicose vein treatment" to identify individuals seeking treatment information for varicose veins. The term "varicose vein treatment" returned a search volume index (SVI), representing the search frequency relative to the total search volume during a specific time interval and region. Linear regression analysis and Kruskal-Wallis one-way analysis of variance were employed to characterize search results. Search traffic for varicose vein treatment increased by 520% over the 104-month study period. There was an annual mean increase of 28% (range, -18%-100%; standard deviation [SD], 35%), with a statistically significant linear increase in average yearly SVI over time (R(2) = 0.94, P < .0001). All years showed positive growth in mean SVI except for 2008 (18% decrease). There were statistically significant differences in SVI by month (Kruskal-Wallis, P < .0001) with significantly higher mean SVI compared with other months in May (190% increase; range, 26%-670%; SD, 15%) and June (209% increase; range, 35%-700%; SD, 20%). The southern United States showed significantly higher search traffic than all other regions (Tukey-Kramer, P < .00001). There have been significant increases in Internet search traffic related to varicose vein treatment in the past 8 years. Reflected in this trend is an annual peak in search traffic in the late spring months with an overall geographic bias toward southern states. Rigorous analysis of Internet search queries for medical procedures may prove useful to guide the efficient use of limited resources and marketing dollars. © 2013 The Society of Interventional Radiology Published by SIR All rights reserved.

  3. Accessing suicide-related information on the internet: a retrospective observational study of search behavior.

    PubMed

    Wong, Paul Wai-Ching; Fu, King-Wa; Yau, Rickey Sai-Pong; Ma, Helen Hei-Man; Law, Yik-Wa; Chang, Shu-Sen; Yip, Paul Siu-Fai

    2013-01-11

    The Internet's potential impact on suicide is of major public health interest as easy online access to pro-suicide information or specific suicide methods may increase suicide risk among vulnerable Internet users. Little is known, however, about users' actual searching and browsing behaviors of online suicide-related information. To investigate what webpages people actually clicked on after searching with suicide-related queries on a search engine and to examine what queries people used to get access to pro-suicide websites. A retrospective observational study was done. We used a web search dataset released by America Online (AOL). The dataset was randomly sampled from all AOL subscribers' web queries between March and May 2006 and generated by 657,000 service subscribers. We found 5526 search queries (0.026%, 5526/21,000,000) that included the keyword "suicide". The 5526 search queries included 1586 different search terms and were generated by 1625 unique subscribers (0.25%, 1625/657,000). Of these queries, 61.38% (3392/5526) were followed by users clicking on a search result. Of these 3392 queries, 1344 (39.62%) webpages were clicked on by 930 unique users but only 1314 of those webpages were accessible during the study period. Each clicked-through webpage was classified into 11 categories. The categories of the most visited webpages were: entertainment (30.13%; 396/1314), scientific information (18.31%; 240/1314), and community resources (14.53%; 191/1314). Among the 1314 accessed webpages, we could identify only two pro-suicide websites. We found that the search terms used to access these sites included "commiting suicide with a gas oven", "hairless goat", "pictures of murder by strangulation", and "photo of a severe burn". A limitation of our study is that the database may be dated and confined to mainly English webpages. Searching or browsing suicide-related or pro-suicide webpages was uncommon, although a small group of users did access websites that contain

  4. Accessing Suicide-Related Information on the Internet: A Retrospective Observational Study of Search Behavior

    PubMed Central

    2013-01-01

    Background The Internet’s potential impact on suicide is of major public health interest as easy online access to pro-suicide information or specific suicide methods may increase suicide risk among vulnerable Internet users. Little is known, however, about users’ actual searching and browsing behaviors of online suicide-related information. Objective To investigate what webpages people actually clicked on after searching with suicide-related queries on a search engine and to examine what queries people used to get access to pro-suicide websites. Methods A retrospective observational study was done. We used a web search dataset released by America Online (AOL). The dataset was randomly sampled from all AOL subscribers’ web queries between March and May 2006 and generated by 657,000 service subscribers. Results We found 5526 search queries (0.026%, 5526/21,000,000) that included the keyword "suicide". The 5526 search queries included 1586 different search terms and were generated by 1625 unique subscribers (0.25%, 1625/657,000). Of these queries, 61.38% (3392/5526) were followed by users clicking on a search result. Of these 3392 queries, 1344 (39.62%) webpages were clicked on by 930 unique users but only 1314 of those webpages were accessible during the study period. Each clicked-through webpage was classified into 11 categories. The categories of the most visited webpages were: entertainment (30.13%; 396/1314), scientific information (18.31%; 240/1314), and community resources (14.53%; 191/1314). Among the 1314 accessed webpages, we could identify only two pro-suicide websites. We found that the search terms used to access these sites included “commiting suicide with a gas oven”, “hairless goat”, “pictures of murder by strangulation”, and “photo of a severe burn”. A limitation of our study is that the database may be dated and confined to mainly English webpages. Conclusions Searching or browsing suicide-related or pro-suicide webpages was

  5. The Contemporary Thesaurus of Social Science Terms and Synonyms: A Guide for Natural Language Computer Searching.

    ERIC Educational Resources Information Center

    Knapp, Sara D., Comp.

    This book is designed primarily to help users find meaningful words for natural language, or free-text, computer searching of bibliographic and textual databases in the social and behavioral sciences. Additionally, it covers many socially relevant and technical topics not covered by the usual literary thesaurus, therefore it may also be useful for…

  6. The Use of ERIC Tapes in Scandinavia, Searching With Thesaurus Terms in Natural Language.

    ERIC Educational Resources Information Center

    Tell, Bjorn V.; And Others

    Since February 1971 the Royal Institute of Technology, Stockholm, has been running the ERIC data base mainly for SDI purposes. The implementation of the data base into the generalized search system, ABACUS, is described. One hundred and fifty-eight users received SDI service at present, 99 from governmental and educational institutions, 23 from…

  7. Sigsearch: a new term for post hoc unplanned search for statistically significant relationships with the intent to create publishable findings.

    PubMed

    Hashim, Muhammad Jawad

    2010-09-01

    Post-hoc secondary data analysis with no prespecified hypotheses has been discouraged by textbook authors and journal editors alike. Unfortunately no single term describes this phenomenon succinctly. I would like to coin the term "sigsearch" to define this practice and bring it within the teaching lexicon of statistics courses. Sigsearch would include any unplanned, post-hoc search for statistical significance using multiple comparisons of subgroups. It would also include data analysis with outcomes other than the prespecified primary outcome measure of a study as well as secondary data analyses of earlier research.

  8. Active Learning by Querying Informative and Representative Examples.

    PubMed

    Huang, Sheng-Jun; Jin, Rong; Zhou, Zhi-Hua

    2014-10-01

    Active learning reduces the labeling cost by iteratively selecting the most valuable data to query their labels. It has attracted a lot of interests given the abundance of unlabeled data and the high cost of labeling. Most active learning approaches select either informative or representative unlabeled instances to query their labels, which could significantly limit their performance. Although several active learning algorithms were proposed to combine the two query selection criteria, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this limitation by developing a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an unlabeled instance. Further, by incorporating the correlation among labels, we extend the QUIRE approach to multi-label learning by actively querying instance-label pairs. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of-the-art active learning approaches in both single-label and multi-label learning.

  9. SEARCH: Study of Environmental Arctic Change--A System-scale, Cross-disciplinary, Long-term Arctic Research Program

    NASA Astrophysics Data System (ADS)

    Wiggins, H. V.; Schlosser, P.; Loring, A. J.; Warnick, W. K.; Committee, S. S.

    2008-12-01

    The Study of Environmental Arctic Change (SEARCH) is a multi-agency effort to observe, understand, and guide responses to changes in the arctic system. Interrelated environmental changes in the Arctic are affecting ecosystems and living resources and are impacting local and global communities and economic activities. Under the SEARCH program, guided by the Science Steering Committee (SSC), the Interagency Program Management Committee (IPMC), and the Observing, Understanding, and Responding to Change panels, scientists with a variety of expertise--atmosphere, ocean and sea ice, hydrology and cryosphere, terrestrial ecosystems, human dimensions, and paleoclimatology--work together to achieve goals of the program. Over 150 projects and activities contribute to SEARCH implementation. The Observing Change component is underway through National Science Foundation's (NSF) Arctic Observing Network (AON), NOAA-sponsored atmospheric and sea ice observations, and other relevant national and international efforts, including the EU- sponsored Developing Arctic Modelling and Observing Capabilities for Long-term Environmental Studies (DAMOCLES) Program. The Understanding Change component of SEARCH consists of modeling and analysis efforts, with strong linkages to relevant programs such as NSF's Arctic System Synthesis (ARCSS) Program. The Responding to Change element is driven by stakeholder research and applications addressing social and economic concerns. As a national program under the International Study of Arctic Change (ISAC), SEARCH is also working to expand international connections in an effort to better understand the global arctic system. SEARCH is sponsored by eight (8) U.S. agencies, including: the National Science Foundation (NSF), the National Oceanic and Atmospheric Administration (NOAA), the National Aeronautics and Space Administration (NASA), the Department of Defense (DOD), the Department of Energy (DOE), the Department of the Interior (DOI), the Smithsonian

  10. Worked Examples in Teaching Queries for Searching Academic Databases

    ERIC Educational Resources Information Center

    Kickham-Samy, Mary

    2013-01-01

    The worked-example effect, an application of cognitive load theory, is a well-supported method of instruction for well-structured problems (Chandler and Sweller, 1991; Cooper and Sweller, 1987; Sweller and Cooper, 1985; Tuovinen & Sweller, 1999; Ward and Sweller, 1990). One limitation is expertise-reversal effect, where advanced students…

  11. Branching Search

    NASA Astrophysics Data System (ADS)

    Eliazar, Iddo

    2017-12-01

    Search processes play key roles in various scientific fields. A widespread and effective search-process scheme, which we term Restart Search, is based on the following restart algorithm: i) set a timer and initiate a search task; ii) if the task was completed before the timer expired, then stop; iii) if the timer expired before the task was completed, then go back to the first step and restart the search process anew. In this paper a branching feature is added to the restart algorithm: at every transition from the algorithm's third step to its first step branching takes place, thus multiplying the search effort. This branching feature yields a search-process scheme which we term Branching Search. The running time of Branching Search is analyzed, closed-form results are established, and these results are compared to the coresponding running-time results of Restart Search.

  12. Experimental quantum private queries with linear optics

    NASA Astrophysics Data System (ADS)

    de Martini, Francesco; Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo; Nagali, Eleonora; Sansoni, Linda; Sciarrino, Fabio

    2009-07-01

    The quantum private query is a quantum cryptographic protocol to recover information from a database, preserving both user and data privacy: the user can test whether someone has retained information on which query was asked and the database provider can test the amount of information released. Here we discuss a variant of the quantum private query algorithm that admits a simple linear optical implementation: it employs the photon’s momentum (or time slot) as address qubits and its polarization as bus qubit. A proof-of-principle experimental realization is implemented.

  13. Provenance Storage, Querying, and Visualization in PBase

    SciTech Connect

    Kianmajd, Parisa; Ludascher, Bertram; Missier, Paolo

    2015-01-01

    We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository provides a scalable infrastructure for querying the provenance data. Furthermore, through its user interface, it is possible to: visualize workflows and execution traces; visualize reachability relations within these traces; issue SPARQL queries; and visualize query results.

  14. Path querying system on mobile devices

    NASA Astrophysics Data System (ADS)

    Lin, Xing; Wang, Yifei; Tian, Yuan; Wu, Lun

    2006-01-01

    Traditional approaches to path querying problems are not efficient and convenient under most circumstances. A more convenient and reliable approach to this problem has to be found. This paper is devoted to a path querying solution on mobile devices. By using an improved Dijkstra's shortest path algorithm and a natural language translating module, this system can help people find the shortest path between two places through their cell phones or other mobile devices. The chosen path is prompted in text of natural language, as well as a map picture. This system would be useful in solving best path querying problems and have potential to be a profitable business system.

  15. An Evaluation of the Interactive Query Expansion in an Online Library Catalogue with a Graphical User Interface.

    ERIC Educational Resources Information Center

    Hancock-Beaulieu, Micheline; And Others

    1995-01-01

    An online library catalog was used to evaluate an interactive query expansion facility based on relevance feedback for the Okapi, probabilistic, term weighting, retrieval system. A graphical user interface allowed searchers to select candidate terms extracted from relevant retrieved items to reformulate queries. Results suggested that the…

  16. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data.

    PubMed

    Putri, Fadhilah Kurnia; Song, Giltae; Kwon, Joonho; Rao, Praveen

    2017-09-25

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query ( DISPAQ ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data.

  17. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data †

    PubMed Central

    Putri, Fadhilah Kurnia; Song, Giltae; Rao, Praveen

    2017-01-01

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query (DISPAQ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation’s Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data. PMID:28946679

  18. Public Awareness of Uterine Power Morcellation Through US Food and Drug Administration Communications: Analysis of Google Trends Search Term Patterns.

    PubMed

    Wood, Lauren N; Jamnagerwalla, Juzar; Markowitz, Melissa A; Thum, D Joseph; McCarty, Philip; Medendorp, Andrew R; Raz, Shlomo; Kim, Ja-Hong

    2018-04-26

    Uterine power morcellation, where the uterus is shred into smaller pieces, is a widely used technique for removal of uterine specimens in patients undergoing minimally invasive abdominal hysterectomy or myomectomy. Complications related to power morcellation of uterine specimens led to US Food and Drug Administration (FDA) communications in 2014 ultimately recommending against the use of power morcellation for women undergoing minimally invasive hysterectomy. Subsequently, practitioners drastically decreased the use of morcellation. We aimed to determine the effect of increased patient awareness on the decrease in use of the morcellator. Google Trends is a public tool that provides data on temporal patterns of search terms, and we correlated this data with the timing of the FDA communication. Weekly relative search volume (RSV) was obtained from Google Trends using the term “morcellation.” Higher RSV corresponds to increases in weekly search volume. Search volumes were divided into 3 groups: the 2 years prior to the FDA communication, a 1-year period following, and thereafter, with the distribution of the weekly RSV over the 3 periods tested using 1-way analysis of variance. Additionally, we analyzed the total number of websites containing the term “morcellation” over this time. The mean RSV prior to the FDA communication was 12.0 (SD 15.8), with the RSV being 60.3 (SD 24.7) in the 1-year after and 19.3 (SD 5.2) thereafter (P<.001). The mean number of webpages containing the term “morcellation” in 2011 was 10,800, rising to 18,800 during 2014 and 36,200 in 2017. Google search activity about morcellation of uterine specimens increased significantly after the FDA communications. This trend indicates an increased public awareness regarding morcellation and its complications. More extensive preoperative counseling and alteration of surgical technique and clinician practice may be necessary. ©Lauren N Wood, Juzar Jamnagerwalla, Melissa A Markowitz, D Joseph

  19. Query-Driven Visualization and Analysis

    SciTech Connect

    Ruebel, Oliver; Bethel, E. Wes; Prabhat, Mr.

    2012-11-01

    This report focuses on an approach to high performance visualization and analysis, termed query-driven visualization and analysis (QDV). QDV aims to reduce the amount of data that needs to be processed by the visualization, analysis, and rendering pipelines. The goal of the data reduction process is to separate out data that is "scientifically interesting'' and to focus visualization, analysis, and rendering on that interesting subset. The premise is that for any given visualization or analysis task, the data subset of interest is much smaller than the larger, complete data set. This strategy---extracting smaller data subsets of interest and focusing ofmore » the visualization processing on these subsets---is complementary to the approach of increasing the capacity of the visualization, analysis, and rendering pipelines through parallelism. This report discusses the fundamental concepts in QDV, their relationship to different stages in the visualization and analysis pipelines, and presents QDV's application to problems in diverse areas, ranging from forensic cybersecurity to high energy physics.« less

  20. Semantic querying of relational data for clinical intelligence: a semantic web services-based approach

    PubMed Central

    2013-01-01

    Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556

  1. Producing approximate answers to database queries

    NASA Technical Reports Server (NTRS)

    Vrbsky, Susan V.; Liu, Jane W. S.

    1993-01-01

    We have designed and implemented a query processor, called APPROXIMATE, that makes approximate answers available if part of the database is unavailable or if there is not enough time to produce an exact answer. The accuracy of the approximate answers produced improves monotonically with the amount of data retrieved to produce the result. The exact answer is produced if all of the needed data are available and query processing is allowed to continue until completion. The monotone query processing algorithm of APPROXIMATE works within the standard relational algebra framework and can be implemented on a relational database system with little change to the relational architecture. We describe here the approximation semantics of APPROXIMATE that serves as the basis for meaningful approximations of both set-valued and single-valued queries. We show how APPROXIMATE is implemented to make effective use of semantic information, provided by an object-oriented view of the database, and describe the additional overhead required by APPROXIMATE.

  2. A Query System Implementation Case Study.

    ERIC Educational Resources Information Center

    Hiser, Judith N.; Neil, M. Elizabeth

    1985-01-01

    The Department of Administrative Programming Services of Clemson University investigated products available in user-friendly retrieval systems. The test of INTELLECT, a natural language query system written by Artifical Intelligence Corporation, is described. (Author/MLW)

  3. The StarView intelligent query mechanism

    NASA Technical Reports Server (NTRS)

    Semmel, R. D.; Silberberg, D. P.

    1993-01-01

    The StarView interface is being developed to facilitate the retrieval of scientific and engineering data produced by the Hubble Space Telescope. While predefined screens in the interface can be used to specify many common requests, ad hoc requests require a dynamic query formulation capability. Unfortunately, logical level knowledge is too sparse to support this capability. In particular, essential formulation knowledge is lost when the domain of interest is mapped to a set of database relation schemas. Thus, a system known as QUICK has been developed that uses conceptual design knowledge to facilitate query formulation. By heuristically determining strongly associated objects at the conceptual level, QUICK is able to formulate semantically reasonable queries in response to high-level requests that specify only attributes of interest. Moreover, by exploiting constraint knowledge in the conceptual design, QUICK assures that queries are formulated quickly and will execute efficiently.

  4. Superfund Chemical Data Matrix (SCDM) Query - Popup

    EPA Pesticide Factsheets

    This site allows you to to easily query the Superfund Chemical Data Matrix (SCDM) and generate a list of the corresponding Hazardous Ranking System (HRS) factor values, benchmarks, and data elements that you need.

  5. Superfund Chemical Data Matrix (SCDM) Query

    EPA Pesticide Factsheets

    This site allows you to to easily query the Superfund Chemical Data Matrix (SCDM) and generate a list of the corresponding Hazard Ranking System (HRS) factor values, benchmarks, and data elements that you need.

  6. Drexel at TREC 2014 Federated Web Search Track

    DTIC Science & Technology

    2014-11-01

    of its input RS results. 1. INTRODUCTION Federated Web Search is the task of searching multiple search engines simultaneously and combining their...or distributed properly[5]. The goal of RS is then, for a given query, to select only the most promising search engines from all those available. Most...result pages of 149 search engines . 4000 queries are used in building the sample set. As a part of the Vertical Selection task, search engines are

  7. BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines.

    PubMed

    Jácome, Alberto G; Fdez-Riverola, Florentino; Lourenço, Anália

    2016-07-01

    Text mining and semantic analysis approaches can be applied to the construction of biomedical domain-specific search engines and provide an attractive alternative to create personalized and enhanced search experiences. Therefore, this work introduces the new open-source BIOMedical Search Engine Framework for the fast and lightweight development of domain-specific search engines. The rationale behind this framework is to incorporate core features typically available in search engine frameworks with flexible and extensible technologies to retrieve biomedical documents, annotate meaningful domain concepts, and develop highly customized Web search interfaces. The BIOMedical Search Engine Framework integrates taggers for major biomedical concepts, such as diseases, drugs, genes, proteins, compounds and organisms, and enables the use of domain-specific controlled vocabulary. Technologies from the Typesafe Reactive Platform, the AngularJS JavaScript framework and the Bootstrap HTML/CSS framework support the customization of the domain-oriented search application. Moreover, the RESTful API of the BIOMedical Search Engine Framework allows the integration of the search engine into existing systems or a complete web interface personalization. The construction of the Smart Drug Search is described as proof-of-concept of the BIOMedical Search Engine Framework. This public search engine catalogs scientific literature about antimicrobial resistance, microbial virulence and topics alike. The keyword-based queries of the users are transformed into concepts and search results are presented and ranked accordingly. The semantic graph view portraits all the concepts found in the results, and the researcher may look into the relevance of different concepts, the strength of direct relations, and non-trivial, indirect relations. The number of occurrences of the concept shows its importance to the query, and the frequency of concept co-occurrence is indicative of biological relations

  8. Visual search for changes in scenes creates long-term, incidental memory traces.

    PubMed

    Utochkin, Igor S; Wolfe, Jeremy M

    2018-05-01

    Humans are very good at remembering large numbers of scenes over substantial periods of time. But how good are they at remembering changes to scenes? In this study, we tested scene memory and change detection two weeks after initial scene learning. In Experiments 1-3, scenes were learned incidentally during visual search for change. In Experiment 4, observers explicitly memorized scenes. At test, after two weeks observers were asked to discriminate old from new scenes, to recall a change that they had detected in the study phase, or to detect a newly introduced change in the memorization experiment. Next, they performed a change detection task, usually looking for the same change as in the study period. Scene recognition memory was found to be similar in all experiments, regardless of the study task. In Experiment 1, more difficult change detection produced better scene memory. Experiments 2 and 3 supported a "depth-of-processing" account for the effects of initial search and change detection on incidental memory for scenes. Of most interest, change detection was faster during the test phase than during the study phase, even when the observer had no explicit memory of having found that change previously. This result was replicated in two of our three change detection experiments. We conclude that scenes can be encoded incidentally as well as explicitly and that changes in those scenes can leave measurable traces even if they are not explicitly recalled.

  9. From headache to tumour: An examination of health anxiety, health-related Internet use and 'query escalation'.

    PubMed

    Singh, Karmpaul; Brown, Richard J

    2016-09-01

    The current study aimed to explore the phenomenon of disease-related 'query escalation' in high/low health anxious Internet users (N = 40). During a 15-minute health-related Internet search, participants rated their anxiety and the perceived seriousness of information on each page. Post-search interviews determined the reasons for, and effects of, escalating queries to consider serious diseases. Both groups were found to be significantly more anxious after escalating queries. The high group was significantly more likely to escalate queries. Evaluating personal relevance of material was the main reason for escalations and moderated anxiety post-escalation. We conclude that searching for online disease information can increase anxiety, particularly for people worried about their health. © The Author(s) 2015.

  10. How to improve your PubMed/MEDLINE searches: 3. advanced searching, MeSH and My NCBI.

    PubMed

    Fatehi, Farhad; Gray, Leonard C; Wootton, Richard

    2014-03-01

    Although the basic PubMed search is often helpful, the results may sometimes be non-specific. For more control over the search process you can use the Advanced Search Builder interface. This allows a targeted search in specific fields, with the convenience of being able to select the intended search field from a list. It also provides a history of your previous searches. The search history is useful to develop a complex search query by combining several previous searches using Boolean operators. For indexing the articles in MEDLINE, the NLM uses a controlled vocabulary system called MeSH. This standardised vocabulary solves the problem of authors, researchers and librarians who may use different terms for the same concept. To be efficient in a PubMed search, you should start by identifying the most appropriate MeSH terms and use them in your search where possible. My NCBI is a personal workspace facility available through PubMed and makes it possible to customise the PubMed interface. It provides various capabilities that can enhance your search performance.

  11. The development of automaticity in short-term memory search: Item-response learning and category learning.

    PubMed

    Cao, Rui; Nosofsky, Robert M; Shiffrin, Richard M

    2017-05-01

    In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across trials. In item-response learning, subjects learn long-term mappings between individual items and target versus foil responses. In category learning, subjects learn high-level codes corresponding to separate sets of items and learn to attach old versus new responses to these category codes. To distinguish between these 2 forms of learning, we tested subjects in categorized varied mapping (CV) conditions: There were 2 distinct categories of items, but the assignment of categories to target versus foil responses varied across trials. In cases involving arbitrary categories, CV performance closely resembled standard varied-mapping performance without categories and departed dramatically from CM performance, supporting the item-response-learning hypothesis. In cases involving prelearned categories, CV performance resembled CM performance, as long as there was sufficient practice or steps taken to reduce trial-to-trial category-switching costs. This pattern of results supports the category-coding hypothesis for sufficiently well-learned categories. Thus, item-response learning occurs rapidly and is used early in CM training; category learning is much slower but is eventually adopted and is used to increase the efficiency of search beyond that available from item-response learning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  12. Managing and Querying Image Annotation and Markup in XML.

    PubMed

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.

  13. Managing and Querying Image Annotation and Markup in XML

    PubMed Central

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid. PMID:21218167

  14. An SQL query generator for CLIPS

    NASA Technical Reports Server (NTRS)

    Snyder, James; Chirica, Laurian

    1990-01-01

    As expert systems become more widely used, their access to large amounts of external information becomes increasingly important. This information exists in several forms such as statistical, tabular data, knowledge gained by experts and large databases of information maintained by companies. Because many expert systems, including CLIPS, do not provide access to this external information, much of the usefulness of expert systems is left untapped. The scope of this paper is to describe a database extension for the CLIPS expert system shell. The current industry standard database language is SQL. Due to SQL standardization, large amounts of information stored on various computers, potentially at different locations, will be more easily accessible. Expert systems should be able to directly access these existing databases rather than requiring information to be re-entered into the expert system environment. The ORACLE relational database management system (RDBMS) was used to provide a database connection within the CLIPS environment. To facilitate relational database access a query generation system was developed as a CLIPS user function. The queries are entered in a CLlPS-like syntax and are passed to the query generator, which constructs and submits for execution, an SQL query to the ORACLE RDBMS. The query results are asserted as CLIPS facts. The query generator was developed primarily for use within the ICADS project (Intelligent Computer Aided Design System) currently being developed by the CAD Research Unit in the California Polytechnic State University (Cal Poly). In ICADS, there are several parallel or distributed expert systems accessing a common knowledge base of facts. Expert system has a narrow domain of interest and therefore needs only certain portions of the information. The query generator provides a common method of accessing this information and allows the expert system to specify what data is needed without specifying how to retrieve it.

  15. Minimizing Statistical Bias with Queries.

    DTIC Science & Technology

    1995-09-14

    method for optimally selecting these points would o er enormous savings in time and money. An active learning system will typically attempt to select data...research in active learning assumes that the sec- ond term of Equation 2 is approximately zero, that is, that the learner is unbiased. If this is the case...outperforms the variance- minimizing algorithm and random exploration. and e ective strategy for active learning . I have given empirical evidence that, with

  16. Query by Browsing: An Alternative Hypertext Information Retrieval Method

    PubMed Central

    Frisse, Mark E.; Cousins, Steve B.

    1989-01-01

    In this paper we discuss our efforts to develop programs which enhance the ability to navigate through large medical hypertext systems. Our approach organizes hypertext index terms into a belief network and uses reader feedback to update the degree of belief in the index terms' utility to a query. We begin by describing various possible configurations for indexes to hypertext. We then describe how belief network calculations can be applied to these indexes. After a brief discussion of early results using manuscripts from a medical handbook, we close with an analysis of our approach's applicability to a wider range of hypertext information retrieval problems.

  17. Surgical and conservative treatment of patients with congenital scoliosis: α search for long-term results

    PubMed Central

    2011-01-01

    Background In view of the limited data available on the conservative treatment of patients with congenital scoliosis (CS), early surgery is suggested in mild cases with formation failures. Patients with segmentation failures will not benefit from conservative treatment. The purpose of this review is to identify the mid- or long-term results of spinal fusion surgery in patients with congenital scoliosis. Methods Retrospective and prospective studies were included, reporting on the outcome of surgery in patients with congenital scoliosis. Studies concerning a small numbers of cases treated conservatively were included too. We analyzed mid-term (5 to 7 years) and long-term results (7 years or more), both as regards the maintenance of the correction of scoliosis and the safety of instrumentation, the early and late complications of surgery and their effect on quality of life. Results A small number of studies of surgically treated patients were found, contained follow-up periods of 4-6 years that in the most cases, skeletal maturity was not yet reached, and few with follow-up of 36-44 years. The results of bracing in children with congenital scoliosis, mainly in cases with failure of formation, were also studied. Discussion Spinal surgery in patients with congenital scoliosis is regarded in short as a safe procedure and should be performed. On the other hand, early and late complications are also described, concerning not only intraoperative and immediate postoperative problems, but also the safety and efficacy of the spinal instrumentation and the possibility of developing neurological disorders and the long-term effect these may have on both lung function and the quality of life of children. Conclusions Few cases indicate the long-term results of surgical techniques, in the natural progression of scoliosis. Similarly, few cases have been reported on the influence of conservative treatment. In conclusion, patients with segmentation failures should be treated

  18. Long-term Doppler Shift and Line Profile Studies of Planetary Search Target Stars

    NASA Technical Reports Server (NTRS)

    McMillan, Robert S.

    2002-01-01

    This grant supported attempts to develop a method for measuring the Doppler shifts of solar-type stars more accurately. The expense of future space borne telescopes to search for solar systems like our own makes it worth trying to improve the relatively inexpensive pre-flight reconnaissance by ground-based telescopes. The concepts developed under this grant contributed to the groundwork for such improvements. They were focused on how to distinguish between extrasolar planets and stellar activity (convection) cycles. To measure the Doppler shift (radial velocity; RV) of the center of mass of a star in the presence of changing convection in the star's photosphere, one can either measure the effect of convection separately from that of the star's motion and subtract its contribution to the apparent RV, or measure the RV in a way that is insensitive to convection. This grant supported investigations into both of these approaches. We explored the use of a Fabry-Perot Etalon HE interferometer and a multichannel Fourier Transform Spectrometer (mFTS), and finished making a 1.8-m telescope operational and potentially available for this work.

  19. In search of consolidation of short-term memory in nonhuman animals.

    PubMed

    Calder, Amanda; White, K Geoffrey

    2014-03-01

    Wixted (Annual Review of Psychology, 55, 235 – 269, 2004) has argued that forgetting is due to consolidation failure. Previous research with humans and nonhuman animals has reported evidence for consolidation in intermediate or long-term memory (LTM). The present study examines whether consolidation occurs in short-term memory in pigeons. Delayed matching-to-sample accuracy was reduced when retroactive interference (an extraneous task in Experiment 1 or houselight illumination in Experiment 2) was interpolated in the retention interval. Accuracy was not greater, however, when interference occurred at the end of the retention interval, as compared with when it occurred at the beginning. That is, there was no evidence for consolidation in short-term memory for pigeons. We did find, however, the beginning–end effect originally reported by Roberts and Grant (Journal of Experimental Psychology: Animal Behavior Processes, 4, 219–236, 1978) and the recovery from forgetting reported by White and Brown (Journal of the Experimental Analysis of Behavior, 96, 177–189, 2011). The results are discussed in relation to temporal distinctiveness theory as an alternative to consolidation.

  20. The I4 Online Query Tool for Earth Observations Data

    NASA Technical Reports Server (NTRS)

    Stefanov, William L.; Vanderbloemen, Lisa A.; Lawrence, Samuel J.

    2015-01-01

    The NASA Earth Observation System Data and Information System (EOSDIS) delivers an average of 22 terabytes per day of data collected by orbital and airborne sensor systems to end users through an integrated online search environment (the Reverb/ECHO system). Earth observations data collected by sensors on the International Space Station (ISS) are not currently included in the EOSDIS system, and are only accessible through various individual online locations. This increases the effort required by end users to query multiple datasets, and limits the opportunity for data discovery and innovations in analysis. The Earth Science and Remote Sensing Unit of the Exploration Integration and Science Directorate at NASA Johnson Space Center has collaborated with the School of Earth and Space Exploration at Arizona State University (ASU) to develop the ISS Instrument Integration Implementation (I4) data query tool to provide end users a clean, simple online interface for querying both current and historical ISS Earth Observations data. The I4 interface is based on the Lunaserv and Lunaserv Global Explorer (LGE) open-source software packages developed at ASU for query of lunar datasets. In order to avoid mirroring existing databases - and the need to continually sync/update those mirrors - our design philosophy is for the I4 tool to be a pure query engine only. Once an end user identifies a specific scene or scenes of interest, I4 transparently takes the user to the appropriate online location to download the data. The tool consists of two public-facing web interfaces. The Map Tool provides a graphic geobrowser environment where the end user can navigate to an area of interest and select single or multiple datasets to query. The Map Tool displays active image footprints for the selected datasets (Figure 1). Selecting a footprint will open a pop-up window that includes a browse image and a link to available image metadata, along with a link to the online location to order or

  1. Patterns of use and impact of standardised MedDRA query analyses on the safety evaluation and review of new drug and biologics license applications.

    PubMed

    Chang, Lin-Chau; Mahmood, Riaz; Qureshi, Samina; Breder, Christopher D

    2017-01-01

    Standardised MedDRA Queries (SMQs) have been developed since the early 2000's and used by academia, industry, public health, and government sectors for detecting safety signals in adverse event safety databases. The purpose of the present study is to characterize how SMQs are used and the impact in safety analyses for New Drug Application (NDA) and Biologics License Application (BLA) submissions to the United States Food and Drug Administration (USFDA). We used the PharmaPendium database to capture SMQ use in Summary Basis of Approvals (SBoAs) of drugs and biologics approved by the USFDA. Characteristics of the drugs and the SMQ use were employed to evaluate the role of SMQ safety analyses in regulatory decisions and the veracity of signals they revealed. A comprehensive search of the SBoAs yielded 184 regulatory submissions approved from 2006 to 2015. Search strategies more frequently utilized restrictive searches with "narrow terms" to enhance specificity over strategies using "broad terms" to increase sensitivity, while some involved modification of search terms. A majority (59%) of 1290 searches used descriptive statistics, however inferential statistics were utilized in 35% of them. Commentary from reviewers and supervisory staff suggested that a small, yet notable percentage (18%) of 1290 searches supported regulatory decisions. The searches with regulatory impact were found in 73 submissions (40% of the submissions investigated). Most searches (75% of 227 searches) with regulatory implications described how the searches were confirmed, indicating prudence in the decision-making process. SMQs have an increasing role in the presentation and review of safety analysis for NDAs/BLAs and their regulatory reviews. This study suggests that SMQs are best used for screening process, with descriptive statistics, description of SMQ modifications, and systematic verification of cases which is crucial for drawing regulatory conclusions.

  2. Heuristic query optimization for query multiple table and multiple clausa on mobile finance application

    NASA Astrophysics Data System (ADS)

    Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG

    2018-01-01

    Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.

  3. System, method and apparatus for conducting a keyterm search

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W. (Inventor)

    2004-01-01

    A keyterm search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more keyterms. Next, a gleaning model of the query is created. The gleaning model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.

  4. Using a Search Engine-Based Mutually Reinforcing Approach to Assess the Semantic Relatedness of Biomedical Terms

    PubMed Central

    Hsu, Yi-Yu; Chen, Hung-Yu; Kao, Hung-Yu

    2013-01-01

    Background Determining the semantic relatedness of two biomedical terms is an important task for many text-mining applications in the biomedical field. Previous studies, such as those using ontology-based and corpus-based approaches, measured semantic relatedness by using information from the structure of biomedical literature, but these methods are limited by the small size of training resources. To increase the size of training datasets, the outputs of search engines have been used extensively to analyze the lexical patterns of biomedical terms. Methodology/Principal Findings In this work, we propose the Mutually Reinforcing Lexical Pattern Ranking (ReLPR) algorithm for learning and exploring the lexical patterns of synonym pairs in biomedical text. ReLPR employs lexical patterns and their pattern containers to assess the semantic relatedness of biomedical terms. By combining sentence structures and the linking activities between containers and lexical patterns, our algorithm can explore the correlation between two biomedical terms. Conclusions/Significance The average correlation coefficient of the ReLPR algorithm was 0.82 for various datasets. The results of the ReLPR algorithm were significantly superior to those of previous methods. PMID:24348899

  5. Device-independent quantum private query

    NASA Astrophysics Data System (ADS)

    Maitra, Arpita; Paul, Goutam; Roy, Sarbani

    2017-04-01

    In quantum private query (QPQ), a client obtains values corresponding to his or her query only, and nothing else from the server, and the server does not get any information about the queries. V. Giovannetti et al. [Phys. Rev. Lett. 100, 230502 (2008)], 10.1103/PhysRevLett.100.230502 gave the first QPQ protocol and since then quite a few variants and extensions have been proposed. However, none of the existing protocols are device independent; i.e., all of them assume implicitly that the entangled states supplied to the client and the server are of a certain form. In this work, we exploit the idea of a local CHSH game and connect it with the scheme of Y. G. Yang et al. [Quantum Info. Process. 13, 805 (2014)], 10.1007/s11128-013-0692-8 to present the concept of a device-independent QPQ protocol.

  6. Achieve Location Privacy-Preserving Range Query in Vehicular Sensing

    PubMed Central

    Lu, Rongxing; Ma, Maode; Bao, Haiyong

    2017-01-01

    Modern vehicles are equipped with a plethora of on-board sensors and large on-board storage, which enables them to gather and store various local-relevant data. However, the wide application of vehicular sensing has its own challenges, among which location-privacy preservation and data query accuracy are two critical problems. In this paper, we propose a novel range query scheme, which helps the data requester to accurately retrieve the sensed data from the distributive on-board storage in vehicular ad hoc networks (VANETs) with location privacy preservation. The proposed scheme exploits structured scalars to denote the locations of data requesters and vehicles, and achieves the privacy-preserving location matching with the homomorphic Paillier cryptosystem technique. Detailed security analysis shows that the proposed range query scheme can successfully preserve the location privacy of the involved data requesters and vehicles, and protect the confidentiality of the sensed data. In addition, performance evaluations are conducted to show the efficiency of the proposed scheme, in terms of computation delay and communication overhead. Specifically, the computation delay and communication overhead are not dependent on the length of the scalar, and they are only proportional to the number of vehicles. PMID:28786943

  7. Bio-TDS: bioscience query tool discovery system.

    PubMed

    Gnimpieba, Etienne Z; VanDiermen, Menno S; Gustafson, Shayla M; Conn, Bill; Lushbough, Carol M

    2017-01-04

    Bioinformatics and computational biology play a critical role in bioscience and biomedical research. As researchers design their experimental projects, one major challenge is to find the most relevant bioinformatics toolkits that will lead to new knowledge discovery from their data. The Bio-TDS (Bioscience Query Tool Discovery Systems, http://biotds.org/) has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains (e.g. genomic, proteomic, bio-imaging) the ability to query over 12 000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS's scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on BIOLOGICAL DATA ANALYSIS: The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researchers the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Achieve Location Privacy-Preserving Range Query in Vehicular Sensing.

    PubMed

    Kong, Qinglei; Lu, Rongxing; Ma, Maode; Bao, Haiyong

    2017-08-08

    Modern vehicles are equipped with a plethora of on-board sensors and large on-board storage, which enables them to gather and store various local-relevant data. However, the wide application of vehicular sensing has its own challenges, among which location-privacy preservation and data query accuracy are two critical problems. In this paper, we propose a novel range query scheme, which helps the data requester to accurately retrieve the sensed data from the distributive on-board storage in vehicular ad hoc networks (VANETs) with location privacy preservation. The proposed scheme exploits structured scalars to denote the locations of data requesters and vehicles, and achieves the privacy-preserving location matching with the homomorphic Paillier cryptosystem technique. Detailed security analysis shows that the proposed range query scheme can successfully preserve the location privacy of the involved data requesters and vehicles, and protect the confidentiality of the sensed data. In addition, performance evaluations are conducted to show the efficiency of the proposed scheme, in terms of computation delay and communication overhead. Specifically, the computation delay and communication overhead are not dependent on the length of the scalar, and they are only proportional to the number of vehicles.

  9. Searching for the elusive neural substrates of body part terms: a neuropsychological study.

    PubMed

    Kemmerer, David; Tranel, Daniel

    2008-06-01

    Previous neuropsychological studies suggest that, compared to other categories of concrete entities, lexical and conceptual aspects of body part knowledge are frequently spared in brain-damaged patients. To further investigate this issue, we administered a battery of 12 tests assessing lexical and conceptual aspects of body part knowledge to 104 brain-damaged patients with lesions distributed throughout the telencephalon. There were two main outcomes. First, impaired oral naming of body parts, attributable to a disturbance of the mapping between lexical-semantic and lexical-phonological structures, was most reliably and specifically associated with lesions in the left frontal opercular and anterior/inferior parietal opercular cortices and in the white matter underlying these regions (8 patients). Also, 1 patient with body part anomia had a left occipital lesion that included the "extrastriate body area" (EBA). Second, knowledge of the meanings of body part terms was remarkably resistant to impairment, regardless of lesion site; in fact, we did not uncover a single patient who exhibited significantly impaired understanding of the meanings of these terms. In the 9 patients with body part anomia, oral naming of concrete entities was evaluated, and this revealed that 4 patients had disproportionately worse naming of body parts relative to other types of concrete entities. Taken together, these findings extend previous neuropsychological and functional neuroimaging studies of body part knowledge and add to our growing understanding of the nuances of how different linguistic and conceptual categories are operated by left frontal and parietal structures.

  10. Searching for the Elusive Neural Substrates of Body Part Terms: A Neuropsychological Study

    PubMed Central

    Kemmerer, David; Tranel, Daniel

    2010-01-01

    Previous neuropsychological studies suggest that, compared to other categories of concrete entities, lexical and conceptual aspects of body part knowledge are frequently spared in brain-damaged patients. To further investigate this issue, we administered a battery of 12 tests assessing lexical and conceptual aspects of body part knowledge to 104 brain-damaged patients with lesions distributed throughout the telencephalon. There were two main outcomes. First, impaired oral naming of body parts, attributable to a disturbance of the mapping between lexical-semantic and lexical-phonological structures, was most reliably and specifically associated with lesions in the left frontal opercular and anterior/inferior parietal opercular cortices, and in the white matter underlying these regions (8 patients). Also, one patient with body part anomia had a left occipital lesion that included the “extrastriate body area” (EBA). Second, knowledge of the meanings of body part terms was remarkably resistant to impairment, regardless of lesion site; in fact, we did not uncover a single patient who exhibited significantly impaired understanding of the meanings of these terms. In the 9 patients with body part anomia, oral naming of concrete entities was evaluated, and this revealed that 4 patients had disproportionately worse naming of body parts relative to other types of concrete entities. Taken together, these findings extend previous neuropsychological and functional neuroimaging studies of body part knowledge, and add to our growing understanding of the nuances of how different linguistic and conceptual categories are operated by left frontal and parietal structures. PMID:18608319

  11. Implementation of Quantum Private Queries Using Nuclear Magnetic Resonance

    NASA Astrophysics Data System (ADS)

    Wang, Chuan; Hao, Liang; Zhao, Lian-Jie

    2011-08-01

    We present a modified protocol for the realization of a quantum private query process on a classical database. Using one-qubit query and CNOT operation, the query process can be realized in a two-mode database. In the query process, the data privacy is preserved as the sender would not reveal any information about the database besides her query information, and the database provider cannot retain any information about the query. We implement the quantum private query protocol in a nuclear magnetic resonance system. The density matrix of the memory registers are constructed.

  12. Spatiotemporal conceptual platform for querying archaeological information systems

    NASA Astrophysics Data System (ADS)

    Partsinevelos, Panagiotis; Sartzetaki, Mary; Sarris, Apostolos

    2015-04-01

    Spatial and temporal distribution of archaeological sites has been shown to associate with several attributes including marine, water, mineral and food resources, climate conditions, geomorphological features, etc. In this study, archeological settlement attributes are evaluated under various associations in order to provide a specialized query platform in a geographic information system (GIS). Towards this end, a spatial database is designed to include a series of archaeological findings for a secluded geographic area of Crete in Greece. The key categories of the geodatabase include the archaeological type (palace, burial site, village, etc.), temporal information of the habitation/usage period (pre Minoan, Minoan, Byzantine, etc.), and the extracted geographical attributes of the sites (distance to sea, altitude, resources, etc.). Most of the related spatial attributes are extracted with readily available GIS tools. Additionally, a series of conceptual data attributes are estimated, including: Temporal relation of an era to a future one in terms of alteration of the archaeological type, topologic relations of various types and attributes, spatial proximity relations between various types. These complex spatiotemporal relational measures reveal new attributes towards better understanding of site selection for prehistoric and/or historic cultures, yet their potential combinations can become numerous. Therefore, after the quantification of the above mentioned attributes, they are classified as of their importance for archaeological site location modeling. Under this new classification scheme, the user may select a geographic area of interest and extract only the important attributes for a specific archaeological type. These extracted attributes may then be queried against the entire spatial database and provide a location map of possible new archaeological sites. This novel type of querying is robust since the user does not have to type a standard SQL query but

  13. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining

    PubMed Central

    Sadesh, S.; Suganthe, R. C.

    2015-01-01

    Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio. PMID:26221626

  14. SPARQL Query Re-writing Using Partonomy Based Transformation Rules

    NASA Astrophysics Data System (ADS)

    Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

  15. STARS 2.0: 2nd-generation open-source archiving and query software

    NASA Astrophysics Data System (ADS)

    Winegar, Tom

    2008-07-01

    The Subaru Telescope is in process of developing an open-source alternative to the 1st-generation software and databases (STARS 1) used for archiving and query. For STARS 2, we have chosen PHP and Python for scripting and MySQL as the database software. We have collected feedback from staff and observers, and used this feedback to significantly improve the design and functionality of our future archiving and query software. Archiving - We identified two weaknesses in 1st-generation STARS archiving software: a complex and inflexible table structure and uncoordinated system administration for our business model: taking pictures from the summit and archiving them in both Hawaii and Japan. We adopted a simplified and normalized table structure with passive keyword collection, and we are designing an archive-to-archive file transfer system that automatically reports real-time status and error conditions and permits error recovery. Query - We identified several weaknesses in 1st-generation STARS query software: inflexible query tools, poor sharing of calibration data, and no automatic file transfer mechanisms to observers. We are developing improved query tools and sharing of calibration data, and multi-protocol unassisted file transfer mechanisms for observers. In the process, we have redefined a 'query': from an invisible search result that can only transfer once in-house right now, with little status and error reporting and no error recovery - to a stored search result that can be monitored, transferred to different locations with multiple protocols, reporting status and error conditions and permitting recovery from errors.

  16. Spectroscopic monitoring of SS 433: A search for long-term variations of kinematic model parameters

    NASA Astrophysics Data System (ADS)

    Davydov, V. V.; Esipov, V. F.; Cherepashchuk, A. M.

    2008-06-01

    Between 1994 and 2006, we obtained uniform spectroscopic observations of SS 433 in the region of H α. We determined Doppler shifts of the moving emission lines, H α + and H α -, and studied various irregularities in the profiles for the moving emission lines. The total number of Doppler shifts measured in these 13 years is 488 for H α - and 389 for H α +. We have also used published data to study possible long-term variations of the SS 433 system, based on 755 Doppler shifts for H α - and 630 for H α + obtained over 28 years. We have derived improved kinematic model parameters for the precessing relativistic jets of S S 433 using five-and eight-parameter models. On average, the precession period was stable during the 28 years of observations (60 precession cycles), at 162.250d ± 0.003d. Phase jumps of the precession period and random variations of its length with amplitudes of ≈6% and ≈1%, respectively, were observed, but no secular changes in the precession period were detected. The nutation period, P nut = 6.2876d ± 0.00035d, and its phase were stable during 28 years (more than 1600 nutation cycles). We find no secular variations of the nutation cycle. The ejection speed of the relativistic jets, v, was, on average, constant during the 28 years, β = v/c = 0.2561 ± 0.0157. No secular variation of β is detected. In general, S S 433 demonstrates remarkably stable long-term characteristics of its precession and nutation, as well as of the central “engine” near the relativistic object that collimates the plasma in the jets and accelerates it to v = 0.2561 c. Our results support a model with a “slaved” accretion disk in S S 433, which follows the precession of the optical star’s rotation axis.

  17. BioFed: federated query processing over life sciences linked open data.

    PubMed

    Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich

    2017-03-15

    Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the

  18. Where are the Binaries? Results of a Long-term Search for Radial Velocity Binaries in Proto-planetary Nebulae

    SciTech Connect

    Hrivnak, Bruce J.; Lu, Wenxian; Steene, Griet Van de

    We present the results of an expanded, long-term radial velocity search (25 years) for evidence of binarity in a sample of seven bright proto-planetary nebulae (PPNe). The goal is to investigate the widely held view that the bipolar or point-symmetric shapes of planetary nebulae (PNe) and PPNe are due to binary interactions. Observations from three observatories were combined from 2007 to 2015 to search for variations on the order of a few years and then combined with earlier observations from 1991 to 1995 to search for variations on the order of decades. All seven show velocity variations due to periodicmore » pulsation in the range of 35–135 days. However, in only one PPN, IRAS 22272+5435, did we find even marginal evidence for multi-year variations that might be due to a binary companion. This object shows marginally significant evidence of a two-year period of low semi-amplitude, which could be due to a low-mass companion, and it also displays some evidence of a much longer period of >30 years. The absence of evidence in the other six objects for long-period radial velocity variations due to a binary companion sets significant constraints on the properties of any undetected binary companions: they must be of low mass, ≤0.2 M {sub ⊙}, or long period, >30 years. Thus the present observations do not provide direct support for the binary hypothesis to explain the shapes of PNe and PPNe and severely constrains the properties of any such undetected companions.« less

  19. Exploring personalized searches using tag-based user profiles and resource profiles in folksonomy.

    PubMed

    Cai, Yi; Li, Qing; Xie, Haoran; Min, Huaqin

    2014-10-01

    With the increase in resource-sharing websites such as YouTube and Flickr, many shared resources have arisen on the Web. Personalized searches have become more important and challenging since users demand higher retrieval quality. To achieve this goal, personalized searches need to take users' personalized profiles and information needs into consideration. Collaborative tagging (also known as folksonomy) systems allow users to annotate resources with their own tags, which provides a simple but powerful way for organizing, retrieving and sharing different types of social resources. In this article, we examine the limitations of previous tag-based personalized searches. To handle these limitations, we propose a new method to model user profiles and resource profiles in collaborative tagging systems. We use a normalized term frequency to indicate the preference degree of a user on a tag. A novel search method using such profiles of users and resources is proposed to facilitate the desired personalization in resource searches. In our framework, instead of the keyword matching or similarity measurement used in previous works, the relevance measurement between a resource and a user query (termed the query relevance) is treated as a fuzzy satisfaction problem of a user's query requirements. We implement a prototype system called the Folksonomy-based Multimedia Retrieval System (FMRS). Experiments using the FMRS data set and the MovieLens data set show that our proposed method outperforms baseline methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. An advanced web query interface for biological databases

    PubMed Central

    Latendresse, Mario; Karp, Peter D.

    2010-01-01

    Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715

  1. [Criteria for the classification as a "domestic-setting corpse"--a literature search and review to define the term].

    PubMed

    Merz, Marius; Birngruber, Christoph G; Heidorn, Frank; Ramsthaler, Frank; Risse, Manfred; Kreutz, Kerstin; Krähahn, Jonathan; Verhoff, Marcel A

    2011-01-01

    In German medical and media circles (daily routine, specialist literature, press, novels), the term "domestic-setting corpse" is frequently used, but the term is only vaguely defined. The authors thus decided to perform an in-depth study of the literature, including historic textbooks and all German- and English-language medicolegal journals, going as far back as their first issues, in an attempt to more clearly define the term. Inclusion criteria used in the search were a post-mortem interval of at least 24 hours prior to discovery and discovery of the corpse in a domestic setting. In the literature, 37 cases that complied with the above-mentioned inclusion criteria were found. These cases frequently described "advanced decomposition", often "unclear cause of death" and "problems in identification". These characteristics can thus be considered as being additional pointers in the definition. However, we suggest that the two general defining characteristics of a "domestic-setting corpse" are a post-mortem interval of more than 24 hours before discovery and the discovery of the corpse in a domestic setting.

  2. Experiments on Interfaces To Support Query Expansion.

    ERIC Educational Resources Information Center

    Beaulieu, M.

    1997-01-01

    Focuses on the user and human-computer interaction aspects of the research based on the Okapi text retrieval system. Three experiments implementing different approaches to query expansion are described, including the use of graphical user interfaces with different windowing techniques. (Author/LRW)

  3. Normalized Legal Drafting and the Query Method.

    ERIC Educational Resources Information Center

    Allen, Layman E.; Engholm, C. Rudy

    1978-01-01

    Normalized legal drafting, a mode of expressing ideas in legal documents so that the syntax that relates the constituent propositions is simplified and standardized, and the query method, a question-asking activity that teaches normalized drafting and provides practice, are examined. Some examples are presented. (JMD)

  4. Language model: Extension to solve inconsistency, incompleteness, and short query in cultural heritage collection

    NASA Astrophysics Data System (ADS)

    Tan, Kian Lam; Lim, Chen Kim

    2017-10-01

    With the explosive growth of online information such as email messages, news articles, and scientific literature, many institutions and museums are converting their cultural collections from physical data to digital format. However, this conversion resulted in the issues of inconsistency and incompleteness. Besides, the usage of inaccurate keywords also resulted in short query problem. Most of the time, the inconsistency and incompleteness are caused by the aggregation fault in annotating a document itself while the short query problem is caused by naive user who has prior knowledge and experience in cultural heritage domain. In this paper, we presented an approach to solve the problem of inconsistency, incompleteness and short query by incorporating the Term Similarity Matrix into the Language Model. Our approach is tested on the Cultural Heritage in CLEF (CHiC) collection which consists of short queries and documents. The results show that the proposed approach is effective and has improved the accuracy in retrieval time.

  5. Internet Searches for Affect-Related Terms: An Indicator of Subjective Well-Being and Predictor of Health Outcomes across US States and Metro Areas.

    PubMed

    Ford, Michael T; Jebb, Andrew T; Tay, Louis; Diener, Ed

    2018-03-01

    The present study explored the potential for internet search data to serve as indicators of subjective well-being (SWB) and predictors of health at the state and metro area levels. We propose that searches for positive and negative affect-related terms represent information-seeking behavior of individuals who are experiencing emotions and seeking information about them. Data on the frequency of Google searches for 15 affect terms were collected from Google's Trends website (trends.google.com). These were paired with data on health, self-reported emotions, psychological well-being, personality, and Twitter postings at the state and metro area levels. Several internet search scores correlated with indicators of cardiovascular health and depression. Some search term scores also correlated strongly with self-reported emotions, well-being metrics, neuroticism, per capita income, and Twitter postings at the state or metro area level. Multiple regression analyses suggest that affect searches predict depression rates at the metro area level beyond the effects of income and other well-being measures. The results highlight the promise and challenges of using internet search data at the aggregate level for physical and mental health assessment and surveillance. © 2018 The International Association of Applied Psychology.

  6. MICA: desktop software for comprehensive searching of DNA databases

    PubMed Central

    Stokes, William A; Glick, Benjamin S

    2006-01-01

    Background Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. Results MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. Conclusion MICA is suitable as a search engine for desktop DNA analysis software. PMID:17018144

  7. Asking better questions: How presentation formats influence information search.

    PubMed

    Wu, Charley M; Meder, Björn; Filimon, Flavia; Nelson, Jonathan D

    2017-08-01

    While the influence of presentation formats have been widely studied in Bayesian reasoning tasks, we present the first systematic investigation of how presentation formats influence information search decisions. Four experiments were conducted across different probabilistic environments, where subjects (N = 2,858) chose between 2 possible search queries, each with binary probabilistic outcomes, with the goal of maximizing classification accuracy. We studied 14 different numerical and visual formats for presenting information about the search environment, constructed across 6 design features that have been prominently related to improvements in Bayesian reasoning accuracy (natural frequencies, posteriors, complement, spatial extent, countability, and part-to-whole information). The posterior variants of the icon array and bar graph formats led to the highest proportion of correct responses, and were substantially better than the standard probability format. Results suggest that presenting information in terms of posterior probabilities and visualizing natural frequencies using spatial extent (a perceptual feature) were especially helpful in guiding search decisions, although environments with a mixture of probabilistic and certain outcomes were challenging across all formats. Subjects who made more accurate probability judgments did not perform better on the search task, suggesting that simple decision heuristics may be used to make search decisions without explicitly applying Bayesian inference to compute probabilities. We propose a new take-the-difference (TTD) heuristic that identifies the accuracy-maximizing query without explicit computation of posterior probabilities. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  8. An alternative database approach for management of SNOMED CT and improved patient data queries.

    PubMed

    Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R

    2015-10-01

    statistics generated using the graph database were identical to those using validated methods. Patient queries produced identical patient count results to the Oracle RDBMS with comparable times. Database queries involving defining attributes of SNOMED CT concepts were possible with the graph DB. The same queries could not be directly performed with the Oracle RDBMS representation of the patient data and required the creation and use of external terminology services. Further, queries of undefined depth were successful in identifying unknown relationships between patient cohorts. The results of this study supported the hypothesis that a patient database built upon and around the semantic model of SNOMED CT was possible. The model supported queries that leveraged all aspects of the SNOMED CT logical model to produce clinically relevant query results. Logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. As medical terminologies, such as SNOMED CT, continue to expand, they will become more complex and model consistency will be more difficult to assure. Simultaneously, consumers of data will increasingly demand improvements to query functionality to accommodate additional granularity of clinical concepts without sacrificing speed. This new line of research provides an alternative approach to instantiating and querying patient data represented using advanced computable clinical terminologies. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Spatial and symbolic queries for 3D image data

    NASA Astrophysics Data System (ADS)

    Benson, Daniel C.; Zick, Gregory L.

    1992-04-01

    We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.

  10. An Improvement to a Multi-Client Searchable Encryption Scheme for Boolean Queries.

    PubMed

    Jiang, Han; Li, Xue; Xu, Qiuliang

    2016-12-01

    The migration of e-health systems to the cloud computing brings huge benefits, as same as some security risks. Searchable Encryption(SE) is a cryptography encryption scheme that can protect the confidentiality of data and utilize the encrypted data at the same time. The SE scheme proposed by Cash et al. in Crypto2013 and its follow-up work in CCS2013 are most practical SE Scheme that support Boolean queries at present. In their scheme, the data user has to generate the search tokens by the counter number one by one and interact with server repeatedly, until he meets the correct one, or goes through plenty of tokens to illustrate that there is no search result. In this paper, we make an improvement to their scheme. We allow server to send back some information and help the user to generate exact search token in the search phase. In our scheme, there are only two round interaction between server and user, and the search token has [Formula: see text] elements, where n is the keywords number in query expression, and [Formula: see text] is the minimum documents number that contains one of keyword in query expression, and the computation cost of server is [Formula: see text] modular exponentiation operation.

  11. Occam's razor: supporting visual query expression for content-based image queries

    NASA Astrophysics Data System (ADS)

    Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

    2005-01-01

    This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).

  12. Occam"s razor: supporting visual query expression for content-based image queries

    NASA Astrophysics Data System (ADS)

    Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

    2004-12-01

    This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).

  13. Is 'self-medication' a useful term to retrieve related publications in the literature? A systematic exploration of related terms.

    PubMed

    Mansouri, Ava; Sarayani, Amir; Ashouri, Asieh; Sherafatmand, Mona; Hadjibabaie, Molouk; Gholami, Kheirollah

    2015-01-01

    Self-Medication (SM), i.e. using medications to treat oneself, is a major concern for health researchers and policy makers. The terms "self medication" or "self-medication" (SM terms) have been used to explain various concepts while several terms have also been employed to define this practice. Hence, retrieving relevant publications would require exhaustive literature screening. So, we assessed the current situation of SM terms in the literature to improve the relevancy of search outcomes. In this Systematic exploration, SM terms were searched in the 6 following databases and publisher's portals till April 2012: Web of Science, Scopus, PubMed, Google scholar, ScienceDirect, and Wiley. A simple search query was used to include only publications with SM terms. We used Relative-Risk (RR) to estimate the probability of SM terms use in related compared to unrelated publications. Sensitivity and specificity of SM terms as keywords in search query were also calculated. Relevant terms to SM practice were extracted and their Likelihood Ratio positive and negative (LR+/-) were calculated to assess their effect on the probability of search outcomes relevancy in addition to previous search queries. We also evaluated the content of unrelated publications. All mentioned steps were performed in title (TI) and title or abstract (TIAB) of publications. 1999 related and 1917 unrelated publications were found. SM terms RR was 4.5 in TI and 2.1 in TIAB. SM terms sensitivity and specificity respectively were 55.4% and 87.7% in TI and 84.0% and 59.5% in TIAB. "OTC" and "Over-The-Counter Medication", with LR+ 16.78 and 16.30 respectively, provided the most conclusive increase in the probability of the relevancy of publications. The most common unrelated SM themes were self-medication hypothesis, drug abuse and Zoopharmacognosy. Due to relatively low specificity or sensitivity of SM terms, relevant terms should be employed in search queries and clear definitions of SM applications should

  14. OpenSearch technology for geospatial resources discovery

    NASA Astrophysics Data System (ADS)

    Papeschi, Fabrizio; Enrico, Boldrini; Mazzetti, Paolo

    2010-05-01

    In 2005, the term Web 2.0 has been coined by Tim O'Reilly to describe a quickly growing set of Web-based applications that share a common philosophy of "mutually maximizing collective intelligence and added value for each participant by formalized and dynamic information sharing". Around this same period, OpenSearch a new Web 2.0 technology, was developed. More properly, OpenSearch is a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format. Due to its strong impact on the way the Web is perceived by users and also due its relevance for businesses, Web 2.0 has attracted the attention of both mass media and the scientific community. This explosive growth in popularity of Web 2.0 technologies like OpenSearch, and practical applications of Service Oriented Architecture (SOA) resulted in an increased interest in similarities, convergence, and a potential synergy of these two concepts. SOA is considered as the philosophy of encapsulating application logic in services with a uniformly defined interface and making these publicly available via discovery mechanisms. Service consumers may then retrieve these services, compose and use them according to their current needs. A great degree of similarity between SOA and Web 2.0 may be leading to a convergence between the two paradigms. They also expose divergent elements, such as the Web 2.0 support to the human interaction in opposition to the typical SOA machine-to-machine interaction. According to these considerations, the Geospatial Information (GI) domain, is also moving first steps towards a new approach of data publishing and discovering, in particular taking advantage of the OpenSearch technology. A specific GI niche is represented by the OGC Catalog Service for Web (CSW) that is part of the OGC Web Services (OWS) specifications suite, which provides a

  15. Annotating images by mining image search results.

    PubMed

    Wang, Xin-Jing; Zhang, Lei; Li, Xirong; Ma, Wei-Ying

    2008-11-01

    Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results. Some 2.4 million images with their surrounding text are collected from a few photo forums to support this approach. The entire process is formulated in a divide-and-conquer framework where a query keyword is provided along with the uncaptioned image to improve both the effectiveness and efficiency. This is helpful when the collected data set is not dense everywhere. In this sense, our approach contains three steps: 1) the search process to discover visually and semantically similar search results, 2) the mining process to identify salient terms from textual descriptions of the search results, and 3) the annotation rejection process to filter out noisy terms yielded by Step 2. To ensure real-time annotation, two key techniques are leveraged-one is to map the high-dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training data set is required, our approach enables annotating with unlimited vocabulary and is highly scalable and robust to outliers. Experimental results on both real Web images and a benchmark image data set show the effectiveness and efficiency of the proposed algorithm. It is also worth noting that, although the entire approach is illustrated within the divide-and conquer framework, a query keyword is not crucial to our current implementation. We provide experimental results to prove this.

  16. Graphical modeling and query language for hospitals.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    So far there has been little evidence that implementation of the health information technologies (HIT) is leading to health care cost savings. One of the reasons for this lack of impact by the HIT likely lies in the complexity of the business process ownership in the hospitals. The goal of our research is to develop a business model-based method for hospital use which would allow doctors to retrieve directly the ad-hoc information from various hospital databases. We have developed a special domain-specific process modelling language called the MedMod. Formally, we define the MedMod language as a profile on UML Class diagrams, but we also demonstrate it on examples, where we explain the semantics of all its elements informally. Moreover, we have developed the Process Query Language (PQL) that is based on MedMod process definition language. The purpose of PQL is to allow a doctor querying (filtering) runtime data of hospital's processes described using MedMod. The MedMod language tries to overcome deficiencies in existing process modeling languages, allowing to specify the loosely-defined sequence of the steps to be performed in the clinical process. The main advantages of PQL are in two main areas - usability and efficiency. They are: 1) the view on data through "glasses" of familiar process, 2) the simple and easy-to-perceive means of setting filtering conditions require no more expertise than using spreadsheet applications, 3) the dynamic response to each step in construction of the complete query that shortens the learning curve greatly and reduces the error rate, and 4) the selected means of filtering and data retrieving allows to execute queries in O(n) time regarding the size of the dataset. We are about to continue developing this project with three further steps. First, we are planning to develop user-friendly graphical editors for the MedMod process modeling and query languages. The second step is to do evaluation of usability the proposed language and tool

  17. How To Do Field Searching in Web Search Engines: A Field Trip.

    ERIC Educational Resources Information Center

    Hock, Ran

    1998-01-01

    Describes the field search capabilities of selected Web search engines (AltaVista, HotBot, Infoseek, Lycos, Yahoo!) and includes a chart outlining what fields (date, title, URL, images, audio, video, links, page depth) are searchable, where to go on the page to search them, the syntax required (if any), and how field search queries are entered.…

  18. Querying databases of trajectories of differential equations: Data structures for trajectories

    NASA Technical Reports Server (NTRS)

    Grossman, Robert

    1989-01-01

    One approach to qualitative reasoning about dynamical systems is to extract qualitative information by searching or making queries on databases containing very large numbers of trajectories. The efficiency of such queries depends crucially upon finding an appropriate data structure for trajectories of dynamical systems. Suppose that a large number of parameterized trajectories gamma of a dynamical system evolving in R sup N are stored in a database. Let Eta is contained in set R sup N denote a parameterized path in Euclidean Space, and let the Euclidean Norm denote a norm on the space of paths. A data structure is defined to represent trajectories of dynamical systems, and an algorithm is sketched which answers queries.

  19. Secure searching of biomarkers through hybrid homomorphic encryption scheme.

    PubMed

    Kim, Miran; Song, Yongsoo; Cheon, Jung Hee

    2017-07-26

    As genome sequencing technology develops rapidly, there has lately been an increasing need to keep genomic data secure even when stored in the cloud and still used for research. We are interested in designing a protocol for the secure outsourcing matching problem on encrypted data. We propose an efficient method to securely search a matching position with the query data and extract some information at the position. After decryption, only a small amount of comparisons with the query information should be performed in plaintext state. We apply this method to find a set of biomarkers in encrypted genomes. The important feature of our method is to encode a genomic database as a single element of polynomial ring. Since our method requires a single homomorphic multiplication of hybrid scheme for query computation, it has the advantage over the previous methods in parameter size, computation complexity, and communication cost. In particular, the extraction procedure not only prevents leakage of database information that has not been queried by user but also reduces the communication cost by half. We evaluate the performance of our method and verify that the computation on large-scale personal data can be securely and practically outsourced to a cloud environment during data analysis. It takes about 3.9 s to search-and-extract the reference and alternate sequences at the queried position in a database of size 4M. Our solution for finding a set of biomarkers in DNA sequences shows the progress of cryptographic techniques in terms of their capability can support real-world genome data analysis in a cloud environment.

  20. Automatic building information model query generation

    DOE PAGES

    Jiang, Yufei; Yu, Nan; Ming, Jiang; ...

    2015-12-01

    Energy efficient building design and construction calls for extensive collaboration between different subfields of the Architecture, Engineering and Construction (AEC) community. Performing building design and construction engineering raises challenges on data integration and software interoperability. Using Building Information Modeling (BIM) data hub to host and integrate building models is a promising solution to address those challenges, which can ease building design information management. However, the partial model query mechanism of current BIM data hub collaboration model has several limitations, which prevents designers and engineers to take advantage of BIM. To address this problem, we propose a general and effective approachmore » to generate query code based on a Model View Definition (MVD). This approach is demonstrated through a software prototype called QueryGenerator. In conclusion, by demonstrating a case study using multi-zone air flow analysis, we show how our approach and tool can help domain experts to use BIM to drive building design with less labour and lower overhead cost.« less

  1. Automatic building information model query generation

    SciTech Connect

    Jiang, Yufei; Yu, Nan; Ming, Jiang

    Energy efficient building design and construction calls for extensive collaboration between different subfields of the Architecture, Engineering and Construction (AEC) community. Performing building design and construction engineering raises challenges on data integration and software interoperability. Using Building Information Modeling (BIM) data hub to host and integrate building models is a promising solution to address those challenges, which can ease building design information management. However, the partial model query mechanism of current BIM data hub collaboration model has several limitations, which prevents designers and engineers to take advantage of BIM. To address this problem, we propose a general and effective approachmore » to generate query code based on a Model View Definition (MVD). This approach is demonstrated through a software prototype called QueryGenerator. In conclusion, by demonstrating a case study using multi-zone air flow analysis, we show how our approach and tool can help domain experts to use BIM to drive building design with less labour and lower overhead cost.« less

  2. Adaptive search in mobile peer-to-peer databases

    NASA Technical Reports Server (NTRS)

    Wolfson, Ouri (Inventor); Xu, Bo (Inventor)

    2010-01-01

    Information is stored in a plurality of mobile peers. The peers communicate in a peer to peer fashion, using a short-range wireless network. Occasionally, a peer initiates a search for information in the peer to peer network by issuing a query. Queries and pieces of information, called reports, are transmitted among peers that are within a transmission range. For each search additional peers are utilized, wherein these additional peers search and relay information on behalf of the originator of the search.

  3. Patterns of use and impact of standardised MedDRA query analyses on the safety evaluation and review of new drug and biologics license applications

    PubMed Central

    Chang, Lin-Chau; Mahmood, Riaz; Qureshi, Samina

    2017-01-01

    Purpose Standardised MedDRA Queries (SMQs) have been developed since the early 2000’s and used by academia, industry, public health, and government sectors for detecting safety signals in adverse event safety databases. The purpose of the present study is to characterize how SMQs are used and the impact in safety analyses for New Drug Application (NDA) and Biologics License Application (BLA) submissions to the United States Food and Drug Administration (USFDA). Methods We used the PharmaPendium database to capture SMQ use in Summary Basis of Approvals (SBoAs) of drugs and biologics approved by the USFDA. Characteristics of the drugs and the SMQ use were employed to evaluate the role of SMQ safety analyses in regulatory decisions and the veracity of signals they revealed. Results A comprehensive search of the SBoAs yielded 184 regulatory submissions approved from 2006 to 2015. Search strategies more frequently utilized restrictive searches with “narrow terms” to enhance specificity over strategies using “broad terms” to increase sensitivity, while some involved modification of search terms. A majority (59%) of 1290 searches used descriptive statistics, however inferential statistics were utilized in 35% of them. Commentary from reviewers and supervisory staff suggested that a small, yet notable percentage (18%) of 1290 searches supported regulatory decisions. The searches with regulatory impact were found in 73 submissions (40% of the submissions investigated). Most searches (75% of 227 searches) with regulatory implications described how the searches were confirmed, indicating prudence in the decision-making process. Conclusions SMQs have an increasing role in the presentation and review of safety analysis for NDAs/BLAs and their regulatory reviews. This study suggests that SMQs are best used for screening process, with descriptive statistics, description of SMQ modifications, and systematic verification of cases which is crucial for drawing regulatory

  4. Optimizing Online Suicide Prevention: A Search Engine-Based Tailored Approach.

    PubMed

    Arendt, Florian; Scherr, Sebastian

    2017-11-01

    Search engines are increasingly used to seek suicide-related information online, which can serve both harmful and helpful purposes. Google acknowledges this fact and presents a suicide-prevention result for particular search terms. Unfortunately, the result is only presented to a limited number of visitors. Hence, Google is missing the opportunity to provide help to vulnerable people. We propose a two-step approach to a tailored optimization: First, research will identify the risk factors. Second, search engines will reweight algorithms according to the risk factors. In this study, we show that the query share of the search term "poisoning" on Google shows substantial peaks corresponding to peaks in actual suicidal behavior. Accordingly, thresholds for showing the suicide-prevention result should be set to the lowest levels during the spring, on Sundays and Mondays, on New Year's Day, and on Saturdays following Thanksgiving. Search engines can help to save lives globally by utilizing a more tailored approach to suicide prevention.

  5. XSemantic: An Extension of LCA Based XML Semantic Search

    NASA Astrophysics Data System (ADS)

    Supasitthimethee, Umaporn; Shimizu, Toshiyuki; Yoshikawa, Masatoshi; Porkaew, Kriengkrai

    One of the most convenient ways to query XML data is a keyword search because it does not require any knowledge of XML structure or learning a new user interface. However, the keyword search is ambiguous. The users may use different terms to search for the same information. Furthermore, it is difficult for a system to decide which node is likely to be chosen as a return node and how much information should be included in the result. To address these challenges, we propose an XML semantic search based on keywords called XSemantic. On the one hand, we give three definitions to complete in terms of semantics. Firstly, the semantic term expansion, our system is robust from the ambiguous keywords by using the domain ontology. Secondly, to return semantic meaningful answers, we automatically infer the return information from the user queries and take advantage of the shortest path to return meaningful connections between keywords. Thirdly, we present the semantic ranking that reflects the degree of similarity as well as the semantic relationship so that the search results with the higher relevance are presented to the users first. On the other hand, in the LCA and the proximity search approaches, we investigated the problem of information included in the search results. Therefore, we introduce the notion of the Lowest Common Element Ancestor (LCEA) and define our simple rule without any requirement on the schema information such as the DTD or XML Schema. The first experiment indicated that XSemantic not only properly infers the return information but also generates compact meaningful results. Additionally, the benefits of our proposed semantics are demonstrated by the second experiment.

  6. Performance of Point and Range Queries for In-memory Databases using Radix Trees on GPUs

    SciTech Connect

    Alam, Maksudul; Yoginath, Srikanth B; Perumalla, Kalyan S

    In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based adaptive radix tree (GRT) implementation over a variety of key distributions, synthetic benchmarks, and actual keys from music and book data sets. The performance is also compared with other index-searching schemes on the GPU. GRT on modern GPUs achieves some of themore » highest rates of index searches reported in the literature. For point queries, a throughput of up to 106 million and 130 million lookups per second is achieved for sparse and dense keys, respectively. For range queries, GRT yields 600 million and 1000 million lookups per second for sparse and dense keys, respectively, on a large dataset of 64 million 32-bit keys.« less

  7. The Weaknesses of Full-Text Searching

    ERIC Educational Resources Information Center

    Beall, Jeffrey

    2008-01-01

    This paper provides a theoretical critique of the deficiencies of full-text searching in academic library databases. Because full-text searching relies on matching words in a search query with words in online resources, it is an inefficient method of finding information in a database. This matching fails to retrieve synonyms, and it also retrieves…

  8. Policy Compliance of Queries for Private Information Retrieval

    DTIC Science & Technology

    2010-11-01

    SPARQL, unfortunately, is not in RDF and so we had to develop tools to translate SPARQL queries into RDF to be used by our policy compliance prototype...policy-assurance/sparql2n3.py) that accepts SPARQL queries and returns the translated query in our simplified ontology. An example of a translated

  9. Multiple Query Evaluation Based on an Enhanced Genetic Algorithm.

    ERIC Educational Resources Information Center

    Tamine, Lynda; Chrisment, Claude; Boughanem, Mohand

    2003-01-01

    Explains the use of genetic algorithms to combine results from multiple query evaluations to improve relevance in information retrieval. Discusses niching techniques, relevance feedback techniques, and evolution heuristics, and compares retrieval results obtained by both genetic multiple query evaluation and classical single query evaluation…

  10. Applying Query Structuring in Cross-language Retrieval.

    ERIC Educational Resources Information Center

    Pirkola, Ari; Puolamaki, Deniz; Jarvelin, Kalervo

    2003-01-01

    Explores ways to apply query structuring in cross-language information retrieval. Tested were: English queries translated into Finnish using an electronic dictionary, and run in a Finnish newspaper databases; effects of compound-based structuring using a proximity operator for translation equivalents of query language compound components; and a…

  11. A Relational Algebra Query Language for Programming Relational Databases

    ERIC Educational Resources Information Center

    McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole

    2011-01-01

    In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…

  12. Guiding Students to Answers: Query Recommendation

    ERIC Educational Resources Information Center

    Yilmazel, Ozgur

    2011-01-01

    This paper reports on a guided navigation system built on the textbook search engine developed at Anadolu University to support distance education students. The search engine uses Turkish Language specific language processing modules to enable searches over course material presented in Open Education Faculty textbooks. We implemented a guided…

  13. New Quality Metrics for Web Search Results

    NASA Astrophysics Data System (ADS)

    Metaxas, Panagiotis Takis; Ivanova, Lilia; Mustafaraj, Eni

    Web search results enjoy an increasing importance in our daily lives. But what can be said about their quality, especially when querying a controversial issue? The traditional information retrieval metrics of precision and recall do not provide much insight in the case of web information retrieval. In this paper we examine new ways of evaluating quality in search results: coverage and independence. We give examples on how these new metrics can be calculated and what their values reveal regarding the two major search engines, Google and Yahoo. We have found evidence of low coverage for commercial and medical controversial queries, and high coverage for a political query that is highly contested. Given the fact that search engines are unwilling to tune their search results manually, except in a few cases that have become the source of bad publicity, low coverage and independence reveal the efforts of dedicated groups to manipulate the search results.

  14. Development of a Search Strategy for an Evidence Based Retrieval Service

    PubMed Central

    Ho, Gah Juan; Liew, Su May; Ng, Chirk Jenn; Hisham Shunmugam, Ranita; Glasziou, Paul

    2016-01-01

    Background Physicians are often encouraged to locate answers for their clinical queries via an evidence-based literature search approach. The methods used are often not clearly specified. Inappropriate search strategies, time constraint and contradictory information complicate evidence retrieval. Aims Our study aimed to develop a search strategy to answer clinical queries among physicians in a primary care setting Methods Six clinical questions of different medical conditions seen in primary care were formulated. A series of experimental searches to answer each question was conducted on 3 commonly advocated medical databases. We compared search results from a PICO (patients, intervention, comparison, outcome) framework for questions using different combinations of PICO elements. We also compared outcomes from doing searches using text words, Medical Subject Headings (MeSH), or a combination of both. All searches were documented using screenshots and saved search strategies. Results Answers to all 6 questions using the PICO framework were found. A higher number of systematic reviews were obtained using a 2 PICO element search compared to a 4 element search. A more optimal choice of search is a combination of both text words and MeSH terms. Despite searching using the Systematic Review filter, many non-systematic reviews or narrative reviews were found in PubMed. There was poor overlap between outcomes of searches using different databases. The duration of search and screening for the 6 questions ranged from 1 to 4 hours. Conclusion This strategy has been shown to be feasible and can provide evidence to doctors’ clinical questions. It has the potential to be incorporated into an interventional study to determine the impact of an online evidence retrieval system. PMID:27935993

  15. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more.

    PubMed

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-07-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more

    PubMed Central

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-01-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. PMID:25925572

  17. Interactive Mental Imagery and Short-Term Memory Search Rates for Words and Pictures. Report from the Project on Children's Learning and Development. Technical Report No. 317.

    ERIC Educational Resources Information Center

    Kerst, Stephen Marshall

    The purposes of this study were to determine if test stimulus was a member of the memory set and if items in an interactive image held in short term memory (STM) could be scanned simultaneously. In experiment one, 50 university subjects compared a test word with a set of one to three words held in STM. The rate of STM search was obtained by…

  18. GeoSearcher: Location-Based Ranking of Search Engine Results.

    ERIC Educational Resources Information Center

    Watters, Carolyn; Amoudi, Ghada

    2003-01-01

    Discussion of Web queries with geospatial dimensions focuses on an algorithm that assigns location coordinates dynamically to Web sites based on the URL. Describes a prototype search system that uses the algorithm to re-rank search engine results for queries with a geospatial dimension, thus providing an alternative ranking order for search engine…

  19. Robust hashing with local models for approximate similarity search.

    PubMed

    Song, Jingkuan; Yang, Yi; Li, Xuelong; Huang, Zi; Yang, Yang

    2014-07-01

    Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing l2,1 -norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.

  20. Determining conserved metabolic biomarkers from a million database queries.

    PubMed

    Kurczy, Michael E; Ivanisevic, Julijana; Johnson, Caroline H; Uritboonthai, Winnie; Hoang, Linh; Fang, Mingliang; Hicks, Matthew; Aldebot, Anthony; Rinehart, Duane; Mellander, Lisa J; Tautenhahn, Ralf; Patti, Gary J; Spilker, Mary E; Benton, H Paul; Siuzdak, Gary

    2015-12-01

    Metabolite databases provide a unique window into metabolome research allowing the most commonly searched biomarkers to be catalogued. Omic scale metabolite profiling, or metabolomics, is finding increased utility in biomarker discovery largely driven by improvements in analytical technologies and the concurrent developments in bioinformatics. However, the successful translation of biomarkers into clinical or biologically relevant indicators is limited. With the aim of improving the discovery of translatable metabolite biomarkers, we present search analytics for over one million METLIN metabolite database queries. The most common metabolites found in METLIN were cross-correlated against XCMS Online, the widely used cloud-based data processing and pathway analysis platform. Analysis of the METLIN and XCMS common metabolite data has two primary implications: these metabolites, might indicate a conserved metabolic response to stressors and, this data may be used to gauge the relative uniqueness of potential biomarkers. METLIN can be accessed by logging on to: https://metlin.scripps.edu siuzdak@scripps.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. A Geospatial Semantic Enrichment and Query Service for Geotagged Photographs

    PubMed Central

    Ennis, Andrew; Nugent, Chris; Morrow, Philip; Chen, Liming; Ioannidis, George; Stan, Alexandru; Rachev, Preslav

    2015-01-01

    With the increasing abundance of technologies and smart devices, equipped with a multitude of sensors for sensing the environment around them, information creation and consumption has now become effortless. This, in particular, is the case for photographs with vast amounts being created and shared every day. For example, at the time of this writing, Instagram users upload 70 million photographs a day. Nevertheless, it still remains a challenge to discover the “right” information for the appropriate purpose. This paper describes an approach to create semantic geospatial metadata for photographs, which can facilitate photograph search and discovery. To achieve this we have developed and implemented a semantic geospatial data model by which a photograph can be enrich with geospatial metadata extracted from several geospatial data sources based on the raw low-level geo-metadata from a smartphone photograph. We present the details of our method and implementation for searching and querying the semantic geospatial metadata repository to enable a user or third party system to find the information they are looking for. PMID:26205265

  2. A Visual Interface for Querying Heterogeneous Phylogenetic Databases.

    PubMed

    Jamil, Hasan M

    2017-01-01

    Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.

  3. Search engines, news wires and digital epidemiology: Presumptions and facts.

    PubMed

    Kaveh-Yazdy, Fatemeh; Zareh-Bidoki, Ali-Mohammad

    2018-07-01

    Digital epidemiology tries to identify diseases dynamics and spread behaviors using digital traces collected via search engines logs and social media posts. However, the impacts of news on information-seeking behaviors have been remained unknown. Data employed in this research provided from two sources, (1) Parsijoo search engine query logs of 48 months, and (2) a set of documents of 28 months of Parsijoo's news service. Two classes of topics, i.e. macro-topics and micro-topics were selected to be tracked in query logs and news. Keywords of the macro-topics were automatically generated using web provided resources and exceeded 10k. Keyword set of micro-topics were limited to a numerable list including terms related to diseases and health-related activities. The tests are established in the form of three studies. Study A includes temporal analyses of 7 macro-topics in query logs. Study B considers analyzing seasonality of searching patterns of 9 micro-topics, and Study C assesses the impact of news media coverage on users' health-related information-seeking behaviors. Study A showed that the hourly distribution of various macro-topics followed the changes in social activity level. Conversely, the interestingness of macro-topics did not follow the regulation of topic distributions. Among macro-topics, "Pharmacotherapy" has highest interestingness level and wider time-window of popularity. In Study B, seasonality of a limited number of diseases and health-related activities were analyzed. Trends of infectious diseases, such as flu, mumps and chicken pox were seasonal. Due to seasonality of most of diseases covered in national vaccination plans, the trend belonging to "Immunization and Vaccination" was seasonal, as well. Cancer awareness events caused peaks in search trends of "Cancer" and "Screening" micro-topics in specific days of each year that mimic repeated patterns which may mistakenly be identified as seasonality. In study C, we assessed the co-integration and

  4. Improve Performance of Data Warehouse by Query Cache

    NASA Astrophysics Data System (ADS)

    Gour, Vishal; Sarangdevot, S. S.; Sharma, Anand; Choudhary, Vinod

    2010-11-01

    The primary goal of data warehouse is to free the information locked up in the operational database so that decision makers and business analyst can make queries, analysis and planning regardless of the data changes in operational database. As the number of queries is large, therefore, in certain cases there is reasonable probability that same query submitted by the one or multiple users at different times. Each time when query is executed, all the data of warehouse is analyzed to generate the result of that query. In this paper we will study how using query cache improves performance of Data Warehouse and try to find the common problems faced. These kinds of problems are faced by Data Warehouse administrators which are minimizes response time and improves the efficiency of query in data warehouse overall, particularly when data warehouse is updated at regular interval.

  5. Improving average ranking precision in user searches for biomedical research datasets

    PubMed Central

    Gobeill, Julien; Gaudinat, Arnaud; Vachon, Thérèse; Ruch, Patrick

    2017-01-01

    Abstract Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorization method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries, and provided competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP, being +22.3% higher than the median infAP of the participant’s best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system’s performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. The similarity measure algorithm showed robust performance in different training conditions, with small performance variations compared to the Divergence from Randomness framework. Finally, the result categorization did not have significant impact on the system’s performance. We believe that our solution could be used to enhance biomedical dataset management systems. The use of data driven expansion methods, such as those based on word embeddings, could be an alternative to the complexity of biomedical terminologies. Nevertheless, due to the limited size of the assessment set, further experiments need to be performed to draw

  6. Analysing Twitter and web queries for flu trend prediction.

    PubMed

    Santos, José Carlos; Matos, Sérgio

    2014-05-07

    Social media platforms encourage people to share diverse aspects of their daily life. Among these, shared health related information might be used to infer health status and incidence rates for specific conditions or symptoms. In this work, we present an infodemiology study that evaluates the use of Twitter messages and search engine query logs to estimate and predict the incidence rate of influenza like illness in Portugal. Based on a manually classified dataset of 2704 tweets from Portugal, we selected a set of 650 textual features to train a Naïve Bayes classifier to identify tweets mentioning flu or flu-like illness or symptoms. We obtained a precision of 0.78 and an F-measure of 0.83, based on cross validation over the complete annotated set. Furthermore, we trained a multiple linear regression model to estimate the health-monitoring data from the Influenzanet project, using as predictors the relative frequencies obtained from the tweet classification results and from query logs, and achieved a correlation ratio of 0.89 (p<0.001). These classification and regression models were also applied to estimate the flu incidence in the following flu season, achieving a correlation of 0.72. Previous studies addressing the estimation of disease incidence based on user-generated content have mostly focused on the english language. Our results further validate those studies and show that by changing the initial steps of data preprocessing and feature extraction and selection, the proposed approaches can be adapted to other languages. Additionally, we investigated whether the predictive model created can be applied to data from the subsequent flu season. In this case, although the prediction result was good, an initial phase to adapt the regression model could be necessary to achieve more robust results.

  7. Protecting count queries in study design

    PubMed Central

    Sarwate, Anand D; Boxwala, Aziz A

    2012-01-01

    Objective Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. Methods A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. Results Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. Conclusions Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems. PMID:22511018

  8. Clean Air Markets - Compliance Query Wizard

    EPA Pesticide Factsheets

    The Compliance Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://ampd.epa.gov/ampd/. The Compliance module provides final compliance results. Using the Compliance Query Wizard, the user can find compliance information associated with specific programs, facilities, states or time frames. Quick Reports and Prepackaged Datasets are also available for data that are commonly requested. Final compliance results are available for all years since 1995 for the Acid Rain Program and for the various NOx trading programs EPA has operated since 1999.EPA's Clean Air Markets Division (CAMD) includes several market-based regulatory programs designed to improve air quality and ecosystems. The most well-known of these programs are EPA's Acid Rain Program and the NOx Programs, which reduce emissions of sulfur dioxide (SO2) and nitrogen oxides (NOx)-compounds that adversely affect air quality, the environment, and public health. CAMD also plays an integral role in the development and implementation of the Clean Air Interstate Rule (CAIR).

  9. Protecting count queries in study design.

    PubMed

    Vinterbo, Staal A; Sarwate, Anand D; Boxwala, Aziz A

    2012-01-01

    Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems.

  10. Clean Air Markets - Allowances Query Wizard

    EPA Pesticide Factsheets

    The Allowances Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://camddataandmaps.epa.gov/gdm/index.cfm. The Allowances module allows the user to view allowance data associated with EPA's emissions trading programs. Allowance data can be specified and organized using the Allowance Query Wizard to find allowances information associated with specific accounts, companies, transactions, programs, facilities, representatives, allowance type, or by date. Quick Reports and Prepackaged Datasets are also available for data that are commonly requested.EPA's Clean Air Markets Division (CAMD) includes several market-based regulatory programs designed to improve air quality and ecosystems. The most well-known of these programs are EPA's Acid Rain Program and the NOx Programs, which reduce emissions of sulfur dioxide (SO2) and nitrogen oxides (NOx)-compounds that adversely affect air quality, the environment, and public health. CAMD also plays an integral role in the development and implementation of the Clean Air Interstate Rule (CAIR).

  11. Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying.

    PubMed

    Kropf, Stefan; Krücken, Peter; Mueller, Wolf; Denecke, Kerstin

    2017-05-18

    Clinical information is often stored as free text, e.g. in discharge summaries or pathology reports. These documents are semi-structured using section headers, numbered lists, items and classification strings. However, it is still challenging to retrieve relevant documents since keyword searches applied on complete unstructured documents result in many false positive retrieval results. We are concentrating on the processing of pathology reports as an example for unstructured clinical documents. The objective is to transform reports semi-automatically into an information structure that enables an improved access and retrieval of relevant data. The data is expected to be stored in a standardized, structured way to make it accessible for queries that are applied to specific sections of a document (section-sensitive queries) and for information reuse. Our processing pipeline comprises information modelling, section boundary detection and section-sensitive queries. For enabling a focused search in unstructured data, documents are automatically structured and transformed into a patient information model specified through openEHR archetypes. The resulting XML-based pathology electronic health records (PEHRs) are queried by XQuery and visualized by XSLT in HTML. Pathology reports (PRs) can be reliably structured into sections by a keyword-based approach. The information modelling using openEHR allows saving time in the modelling process since many archetypes can be reused. The resulting standardized, structured PEHRs allow accessing relevant data by retrieving data matching user queries. Mapping unstructured reports into a standardized information model is a practical solution for a better access to data. Archetype-based XML enables section-sensitive retrieval and visualisation by well-established XML techniques. Focussing the retrieval to particular sections has the potential of saving retrieval time and improving the accuracy of the retrieval.

  12. Unhappy with internal corporate search? : learn tips and tricks for building a controlled vocabulary ontology.

    SciTech Connect

    Arpin, Bettina Karin Schimanski; Jones, Brian S.; Bemesderfer, Joy

    2010-06-01

    Are your employees unhappy with internal corporate search? Frequent complaints include: too many results to sift through; results are unrelated/outdated; employees aren't sure which terms to search for. One way to improve intranet search is to implement a controlled vocabulary ontology. Employing this takes the guess work out of searching, makes search efficient and precise, educates employees about the lingo used within the corporation, and allows employees to contribute to the corpus of terms. It promotes internal corporate search to rival its superior sibling, internet search. We will cover our experiences, lessons learned, and conclusions from implementing a controlled vocabularymore » ontology at Sandia National Laboratories. The work focuses on construction of this ontology from the content perspective and the technical perspective. We'll discuss the following: (1) The tool we used to build a polyhierarchical taxonomy; (2) Examples of two methods of indexing the content: traditional 'back of the book' and folksonomy word-mapping; (3) Tips on how to build future search capabilities while building the basic controlled vocabulary; (4) How to implement the controlled vocabulary as an ontology that mimics Google's search suggestions; (5) Making the user experience more interactive and intuitive; and (6) Sorting suggestions based on preferred, alternate and related terms using SPARQL queries. In summary, future improvements will be presented, including permitting end-users to add, edit and remove terms, and filtering on different subject domains.« less

  13. Design of a Low-Cost Adaptive Question Answering System for Closed Domain Factoid Queries

    ERIC Educational Resources Information Center

    Toh, Huey Ling

    2010-01-01

    Closed domain question answering (QA) systems achieve precision and recall at the cost of complex language processing techniques to parse the answer corpus. We propose a "query-based" model for indexing answers in a closed domain factoid QA system. Further, we use a phrase term inference method for improving the ranking order of related questions.…

  14. On the Delusiveness of Adopting a Common Space for Modeling IR Objects: Are Queries Documents?

    ERIC Educational Resources Information Center

    Bollmann-Sdorra, Peter; Raghavan, Vjay V.

    1993-01-01

    Proposes that document space and query space have different structures in information retrieval and discusses similarity measures, term independence, and linear structure. Examples are given using the retrieval functions of dot-product, the cosine measure, the coefficient of Jaccard, and the overlap function. (Contains 28 references.) (LRW)

  15. Iterative Exploration, Design and Evaluation of Support for Query Reformulation in Interactive Information Retrieval.

    ERIC Educational Resources Information Center

    Belkin, N. J.; Cool, C.; Kelly, D.; Lin, S. -J.; Park, S. Y.; Perez-Carballo, J.; Sikora, C.

    2001-01-01

    Reports on the progressive investigation of techniques for supporting interactive query reformulation in the TREC (Text Retrieval Conference) Interactive Track. Highlights include methods of term suggestion; interface design to support different system functionalities; an overview of each year's TREC investigation; and relevance to the development…

  16. Is ‘Self-Medication’ a Useful Term to Retrieve Related Publications in the Literature? A Systematic Exploration of Related Terms

    PubMed Central

    Mansouri, Ava; Sarayani, Amir; Ashouri, Asieh; Sherafatmand, Mona; Hadjibabaie, Molouk; Gholami, Kheirollah

    2015-01-01

    Background Self-Medication (SM), i.e. using medications to treat oneself, is a major concern for health researchers and policy makers. The terms “self medication” or “self-medication” (SM terms) have been used to explain various concepts while several terms have also been employed to define this practice. Hence, retrieving relevant publications would require exhaustive literature screening. So, we assessed the current situation of SM terms in the literature to improve the relevancy of search outcomes. Methods In this Systematic exploration, SM terms were searched in the 6 following databases and publisher’s portals till April 2012: Web of Science, Scopus, PubMed, Google scholar, ScienceDirect, and Wiley. A simple search query was used to include only publications with SM terms. We used Relative-Risk (RR) to estimate the probability of SM terms use in related compared to unrelated publications. Sensitivity and specificity of SM terms as keywords in search query were also calculated. Relevant terms to SM practice were extracted and their Likelihood Ratio positive and negative (LR+/-) were calculated to assess their effect on the probability of search outcomes relevancy in addition to previous search queries. We also evaluated the content of unrelated publications. All mentioned steps were performed in title (TI) and title or abstract (TIAB) of publications. Results 1999 related and 1917 unrelated publications were found. SM terms RR was 4.5 in TI and 2.1 in TIAB. SM terms sensitivity and specificity respectively were 55.4% and 87.7% in TI and 84.0% and 59.5% in TIAB. “OTC” and “Over-The-Counter Medication”, with LR+ 16.78 and 16.30 respectively, provided the most conclusive increase in the probability of the relevancy of publications. The most common unrelated SM themes were self-medication hypothesis, drug abuse and Zoopharmacognosy. Conclusions Due to relatively low specificity or sensitivity of SM terms, relevant terms should be employed in

  17. Query-Time Optimization Techniques for Structured Queries in Information Retrieval

    ERIC Educational Resources Information Center

    Cartright, Marc-Allen

    2013-01-01

    The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective,…

  18. Lost in translation? A multilingual Query Builder improves the quality of PubMed queries: a randomised controlled trial.

    PubMed

    Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas

    2017-07-03

    MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p < 0.01). It took significantly more time for the members of the English group than the members of the French group to build each query, respectively 194 sec vs. 128 sec; p < 0.01. This multi-lingual query builder is an effective tool to improve the quality of PubMed queries in particular for researchers whose first language is not English.

  19. The Complex Dynamics of Sponsored Search Markets

    NASA Astrophysics Data System (ADS)

    Robu, Valentin; La Poutré, Han; Bohte, Sander

    This paper provides a comprehensive study of the structure and dynamics of online advertising markets, mostly based on techniques from the emergent discipline of complex systems analysis. First, we look at how the display rank of a URL link influences its click frequency, for both sponsored search and organic search. Second, we study the market structure that emerges from these queries, especially the market share distribution of different advertisers. We show that the sponsored search market is highly concentrated, with less than 5% of all advertisers receiving over 2/3 of the clicks in the market. Furthermore, we show that both the number of ad impressions and the number of clicks follow power law distributions of approximately the same coefficient. However, we find this result does not hold when studying the same distribution of clicks per rank position, which shows considerable variance, most likely due to the way advertisers divide their budget on different keywords. Finally, we turn our attention to how such sponsored search data could be used to provide decision support tools for bidding for combinations of keywords. We provide a method to visualize keywords of interest in graphical form, as well as a method to partition these graphs to obtain desirable subsets of search terms.

  20. Automatic Processing of Current Affairs Queries

    ERIC Educational Resources Information Center

    Salton, G.

    1973-01-01

    The SMART system is used for the analysis, search and retrieval of news stories appearing in Time'' magazine. A comparison is made between the automatic text processing methods incorporated into the SMART system and a manual search using the classified index to Time.'' (14 references) (Author)

  1. Seasons, Searches, and Intentions: What The Internet Can Tell Us About The Bed Bug (Hemiptera: Cimicidae) Epidemic.

    PubMed

    Sentana-Lledo, Daniel; Barbu, Corentin M; Ngo, Michelle N; Wu, Yage; Sethuraman, Karthik; Levy, Michael Z

    2016-01-01

    The common bed bug (Cimex lectularius L.) is once again prevalent in the United States. We investigated temporal patterns in Google search queries for bed bugs and co-occurring terms, and conducted in-person surveys to explore the intentions behind searches that included those terms. Searches for "bed bugs" rose steadily through 2011 and then plateaued, suggesting that the epidemic has reached an equilibrium in the United States. However, queries including terms that survey respondents associated strongly with having bed bugs (e.g., "exterminator," "remedies") continued to climb, while terms more closely associated with informational searches (e.g., "hotels," "about") fell. Respondents' rankings of terms and nonseasonal trends in Google search volume as assessed by a cosinor model were significantly correlated (Kendall's Tau-b P = 0.015). We find no evidence from Google Trends that the bed bug epidemic in the United States has reached equilibrium. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. Semantically Enriching the Search System of a Music Digital Library

    NASA Astrophysics Data System (ADS)

    de Juan, Paloma; Iglesias, Carlos

    Traditional search systems are usually based on keywords, a very simple and convenient mechanism to express a need for information. This is the most popular way of searching the Web, although it is not always an easy task to accurately summarize a natural language query in a few keywords. Working with keywords means losing the context, which is the only thing that can help us deal with ambiguity. This is the biggest problem of keyword-based systems. Semantic Web technologies seem a perfect solution to this problem, since they make it possible to represent the semantics of a given domain. In this chapter, we present three projects, Harmos, Semusici and Cantiga, whose aim is to provide access to a music digital library. We will describe two search systems, a traditional one and a semantic one, developed in the context of these projects and compare them in terms of usability and effectiveness.

  3. 41. DISCOVERY, SEARCH, AND COMMUNICATION OF TEXTUAL KNOWLEDGE RESOURCES IN DISTRIBUTED SYSTEMS a. Discovering and Utilizing Knowledge Sources for Metasearch Knowledge Systems

    SciTech Connect

    Zamora, Antonio

    Advanced Natural Language Processing Tools for Web Information Retrieval, Content Analysis, and Synthesis. The goal of this SBIR was to implement and evaluate several advanced Natural Language Processing (NLP) tools and techniques to enhance the precision and relevance of search results by analyzing and augmenting search queries and by helping to organize the search output obtained from heterogeneous databases and web pages containing textual information of interest to DOE and the scientific-technical user communities in general. The SBIR investigated 1) the incorporation of spelling checkers in search applications, 2) identification of significant phrases and concepts using a combination of linguisticmore » and statistical techniques, and 3) enhancement of the query interface and search retrieval results through the use of semantic resources, such as thesauri. A search program with a flexible query interface was developed to search reference databases with the objective of enhancing search results from web queries or queries of specialized search systems such as DOE's Information Bridge. The DOE ETDE/INIS Joint Thesaurus was processed to create a searchable database. Term frequencies and term co-occurrences were used to enhance the web information retrieval by providing algorithmically-derived objective criteria to organize relevant documents into clusters containing significant terms. A thesaurus provides an authoritative overview and classification of a field of knowledge. By organizing the results of a search using the thesaurus terminology, the output is more meaningful than when the results are just organized based on the terms that co-occur in the retrieved documents, some of which may not be significant. An attempt was made to take advantage of the hierarchy provided by broader and narrower terms, as well as other field-specific information in the thesauri. The search program uses linguistic morphological routines to find relevant entries regardless of

  4. Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples

    PubMed Central

    Wilks, Christopher; Gaddipati, Phani; Nellore, Abhinav

    2018-01-01

    Abstract Motivation As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. These enable researchers to leverage vast datasets that would otherwise be difficult to obtain. Results Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70 000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can score junctions according to tissue specificity or other criteria, and can score samples according to the relative frequency of different splicing patterns. We describe the software and outline biological questions that can be explored with Snaptron queries. Availability and implementation Documentation is at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron and https://github.com/ChristopherWilks/snaptron-experiments with a CC BY-NC 4.0 license. Contact chris.wilks@jhu.edu or langmea@cs.jhu.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28968689

  5. Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples.

    PubMed

    Wilks, Christopher; Gaddipati, Phani; Nellore, Abhinav; Langmead, Ben

    2018-01-01

    As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. These enable researchers to leverage vast datasets that would otherwise be difficult to obtain. Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70 000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can score junctions according to tissue specificity or other criteria, and can score samples according to the relative frequency of different splicing patterns. We describe the software and outline biological questions that can be explored with Snaptron queries. Documentation is at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron and https://github.com/ChristopherWilks/snaptron-experiments with a CC BY-NC 4.0 license. chris.wilks@jhu.edu or langmea@cs.jhu.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  6. TEQUEL: The query language of SADDLE

    NASA Technical Reports Server (NTRS)

    Rajan, S. D.

    1984-01-01

    A relational database management system is presented that is tailored for engineering applications. A wide variety of engineering data types are supported and the data definition language (DDL) and data manipulation language (DML) are extended to handle matrices. The system can be used either in the standalone mode or through a FORTRAN or PASCAL application program. The query language is of the relational calculus type and allows the user to store, retrieve, update and delete tuples from relations. The relational operations including union, intersect and differ facilitate creation of temporary relations that can be used for manipulating information in a powerful manner. Sample applications are shown to illustrate the creation of data through a FORTRAN program and data manipulation using the TEQUEL DML.

  7. Visual perception-based criminal identification: a query-based approach

    NASA Astrophysics Data System (ADS)

    Singh, Avinash Kumar; Nandi, G. C.

    2017-01-01

    The visual perception of eyewitness plays a vital role in criminal identification scenario. It helps law enforcement authorities in searching particular criminal from their previous record. It has been reported that searching a criminal record manually requires too much time to get the accurate result. We have proposed a query-based approach which minimises the computational cost along with the reduction of search space. A symbolic database has been created to perform a stringent analysis on 150 public (Bollywood celebrities and Indian cricketers) and 90 local faces (our data-set). An expert knowledge has been captured to encapsulate every criminal's anatomical and facial attributes in the form of symbolic representation. A fast query-based searching strategy has been implemented using dynamic decision tree data structure which allows four levels of decomposition to fetch respective criminal records. Two types of case studies - viewed and forensic sketches have been considered to evaluate the strength of our proposed approach. We have derived 1200 views of the entire population by taking into consideration 80 participants as eyewitness. The system demonstrates an accuracy level of 98.6% for test case I and 97.8% for test case II. It has also been reported that experimental results reduce the search space up to 30 most relevant records.

  8. The ESIS query environment pilot project

    NASA Technical Reports Server (NTRS)

    Fuchs, Jens J.; Ciarlo, Alessandro; Benso, Stefano

    1993-01-01

    The European Space Information System (ESIS) was originally conceived to provide the European space science community with simple and efficient access to space data archives, facilities with which to examine and analyze the retrieved data, and general information services. To achieve that ESIS will provide the scientists with a discipline specific environment for querying in a uniform and transparent manner data stored in geographically dispersed archives. Furthermore it will provide discipline specific tools for displaying and analyzing the retrieved data. The central concept of ESIS is to achieve a more efficient and wider usage of space scientific data, while maintaining the physical archives at the institutions which created them, and has the best background for ensuring and maintaining the scientific validity and interest of the data. In addition to coping with the physical distribution of data, ESIS is to manage also the heterogenity of the individual archives' data models, formats and data base management systems. Thus the ESIS system shall appear to the user as a single database, while it does in fact consist of a collection of dispersed and locally managed databases and data archives. The work reported in this paper is one of the results of the ESIS Pilot Project which is to be completed in 1993. More specifically it presents the pilot ESIS Query Environment (ESIS QE) system which forms the data retrieval and data dissemination axis of the ESIS system. The others are formed by the ESIS Correlation Environment (ESIS CE) and the ESIS Information Services. The ESIS QE Pilot Project is carried out for the European Space Agency's Research and Information center, ESRIN, by a Consortium consisting of Computer Resources International, Denmark, CISET S.p.a, Italy, the University of Strasbourg, France and the Rutherford Appleton Laboratories in the U.K. Furthermore numerous scientists both within ESA and space science community in Europe have been involved in

  9. A Survey in Indexing and Searching XML Documents.

    ERIC Educational Resources Information Center

    Luk, Robert W. P.; Leong, H. V.; Dillon, Tharam S.; Chan, Alvin T. S.; Croft, W. Bruce; Allan, James

    2002-01-01

    Discussion of XML focuses on indexing techniques for XML documents, grouping them into flat-file, semistructured, and structured indexing paradigms. Highlights include searching techniques, including full text search and multistage search; search result presentations; database and information retrieval system integration; XML query languages; and…

  10. RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms

    NASA Astrophysics Data System (ADS)

    Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.

  11. Evaluation of Sub Query Performance in SQL Server

    NASA Astrophysics Data System (ADS)

    Oktavia, Tanty; Sujarwo, Surya

    2014-03-01

    The paper explores several sub query methods used in a query and their impact on the query performance. The study uses experimental approach to evaluate the performance of each sub query methods combined with indexing strategy. The sub query methods consist of in, exists, relational operator and relational operator combined with top operator. The experimental shows that using relational operator combined with indexing strategy in sub query has greater performance compared with using same method without indexing strategy and also other methods. In summary, for application that emphasized on the performance of retrieving data from database, it better to use relational operator combined with indexing strategy. This study is done on Microsoft SQL Server 2012.

  12. Querying and Extracting Timeline Information from Road Traffic Sensor Data

    PubMed Central

    Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

    2016-01-01

    The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900

  13. FitSearch: a robust way to interpret a yeast fitness profile in terms of drug's mode-of-action.

    PubMed

    Lee, Minho; Han, Sangjo; Chang, Hyeshik; Kwak, Youn-Sig; Weller, David M; Kim, Dongsup

    2013-01-01

    Yeast deletion-mutant collections have been successfully used to infer the mode-of-action of drugs especially by profiling chemical-genetic and genetic-genetic interactions on a genome-wide scale. Although tens of thousands of those profiles are publicly available, a lack of an accurate method for mining such data has been a major bottleneck for more widespread use of these useful resources. For general usage of those public resources, we designed FitRankDB as a general repository of fitness profiles, and developed a new search algorithm, FitSearch, for identifying the profiles that have a high similarity score with statistical significance for a given fitness profile. We demonstrated that our new repository and algorithm are highly beneficial to researchers who attempting to make hypotheses based on unknown modes-of-action of bioactive compounds, regardless of the types of experiments that have been performed using yeast deletion-mutant collection in various types of different measurement platforms, especially non-chip-based platforms. We showed that our new database and algorithm are useful when attempting to construct a hypothesis regarding the unknown function of a bioactive compound through small-scale experiments with a yeast deletion collection in a platform independent manner. The FitRankDB and FitSearch enhance the ease of searching public yeast fitness profiles and obtaining insights into unknown mechanisms of action of drugs. FitSearch is freely available at http://fitsearch.kaist.ac.kr.

  14. Respiratory syncytial virus tracking using internet search engine data.

    PubMed

    Oren, Eyal; Frere, Justin; Yom-Tov, Eran; Yom-Tov, Elad

    2018-04-03

    Respiratory Syncytial Virus (RSV) is the leading cause of hospitalization in children less than 1 year of age in the United States. Internet search engine queries may provide high resolution temporal and spatial data to estimate and predict disease activity. After filtering an initial list of 613 symptoms using high-resolution Bing search logs, we used Google Trends data between 2004 and 2016 for a smaller list of 50 terms to build predictive models of RSV incidence for five states where long-term surveillance data was available. We then used domain adaptation to model RSV incidence for the 45 remaining US states. Surveillance data sources (hospitalization and laboratory reports) were highly correlated, as were laboratory reports with search engine data. The four terms which were most often statistically significantly correlated as time series with the surveillance data in the five state models were RSV, flu, pneumonia, and bronchiolitis. Using our models, we tracked the spread of RSV by observing the time of peak use of the search term in different states. In general, the RSV peak moved from south-east (Florida) to the north-west US. Our study represents the first time that RSV has been tracked using Internet data results and highlights successful use of search filters and domain adaptation techniques, using data at multiple resolutions. Our approach may assist in identifying spread of both local and more widespread RSV transmission and may be applicable to other seasonal conditions where comprehensive epidemiological data is difficult to collect or obtain.

  15. Evaluation of the Feasibility of Screening Patients for Early Signs of Lung Carcinoma in Web Search Logs.

    PubMed

    White, Ryen W; Horvitz, Eric

    2017-03-01

    from 0.00001 to 0.001, respectively. The methods can be used to identify people at highest risk up to a year in advance of the inferred diagnosis time. The 5 factors associated with the highest relative risk (RR) were evidence of family history (RR = 7.548; 95% CI, 3.937-14.470), age (RR = 3.558; 95% CI, 3.357-3.772), radon (RR = 2.529; 95% CI, 1.137-5.624), primary location (RR = 2.463; 95% CI, 1.364-4.446), and occupation (RR = 1.969; 95% CI, 1.143-3.391). Evidence of smoking (RR = 1.646; 95% CI, 1.032-2.260) was important but not top-ranked, which was due to the difficulty of identifying smoking history from search terms. Pattern recognition based on data drawn from large-scale web search queries holds opportunity for identifying risk factors and frames new directions with early detection of lung carcinoma.

  16. Autocorrelation and Regularization of Query-Based Information Retrieval Scores

    DTIC Science & Technology

    2008-02-01

    of the most general information retrieval models [ Salton , 1968]. By treating a query as a very short document, documents and queries can be rep... Salton , 1971]. In the context of single link hierarchical clustering, Jardine and van Rijsbergen showed that ranking all k clusters and retrieving a...a document about “dogs”, then the system will always miss this document when a user queries “dog”. Salton recognized that a document’s representation

  17. Building a better search engine for earth science data

    NASA Astrophysics Data System (ADS)

    Armstrong, E. M.; Yang, C. P.; Moroni, D. F.; McGibbney, L. J.; Jiang, Y.; Huang, T.; Greguska, F. R., III; Li, Y.; Finch, C. J.

    2017-12-01

    Free text data searching of earth science datasets has been implemented with varying degrees of success and completeness across the spectrum of the 12 NASA earth sciences data centers. At the JPL Physical Oceanography Distributed Active Archive Center (PO.DAAC) the search engine has been developed around the Solr/Lucene platform. Others have chosen other popular enterprise search platforms like Elasticsearch. Regardless, the default implementations of these search engines leveraging factors such as dataset popularity, term frequency and inverse document term frequency do not fully meet the needs of precise relevancy and ranking of earth science search results. For the PO.DAAC, this shortcoming has been identified for several years by its external User Working Group that has assigned several recommendations to improve the relevancy and discoverability of datasets related to remotely sensed sea surface temperature, ocean wind, waves, salinity, height and gravity that comprise a total count of over 500 public availability datasets. Recently, the PO.DAAC has teamed with an effort led by George Mason University to improve the improve the search and relevancy ranking of oceanographic data via a simple search interface and powerful backend services called MUDROD (Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to Improve Data Discovery) funded by the NASA AIST program. MUDROD has mined and utilized the combination of PO.DAAC earth science dataset metadata, usage metrics, and user feedback and search history to objectively extract relevance for improved data discovery and access. In addition to improved dataset relevance and ranking, the MUDROD search engine also returns recommendations to related datasets and related user queries. This presentation will report on use cases that drove the architecture and development, and the success metrics and improvements on search precision and recall that MUDROD has demonstrated over the existing PO.DAAC search

  18. NBIC: Search Ballast Report Database

    Science.gov Websites

    Smithsonian Environmental Research Center Logo US Coast Guard Logo Submit BW Report | Search NBIC Database developed an online database that can be queried through our website. Data are accessible for all coastal Lakes, have been incorporated into the NBIC database as of August 2004. Information on data availability

  19. Turning Search into Knowledge Management.

    ERIC Educational Resources Information Center

    Kaufman, David

    2002-01-01

    Discussion of knowledge management for electronic data focuses on creating a high quality similarity ranking algorithm. Topics include similarity ranking and unstructured data management; searching, categorization, and summarization of documents; query evaluation; considering sentences in addition to keywords; and vector models. (LRW)

  20. Distributed query plan generation using multiobjective genetic algorithm.

    PubMed

    Panicker, Shina; Kumar, T V Vijay

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.

  1. Model-based query language for analyzing clinical processes.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.

  2. PAQ: Persistent Adaptive Query Middleware for Dynamic Environments

    NASA Astrophysics Data System (ADS)

    Rajamani, Vasanth; Julien, Christine; Payton, Jamie; Roman, Gruia-Catalin

    Pervasive computing applications often entail continuous monitoring tasks, issuing persistent queries that return continuously updated views of the operational environment. We present PAQ, a middleware that supports applications' needs by approximating a persistent query as a sequence of one-time queries. PAQ introduces an integration strategy abstraction that allows composition of one-time query responses into streams representing sophisticated spatio-temporal phenomena of interest. A distinguishing feature of our middleware is the realization that the suitability of a persistent query's result is a function of the application's tolerance for accuracy weighed against the associated overhead costs. In PAQ, programmers can specify an inquiry strategy that dictates how information is gathered. Since network dynamics impact the suitability of a particular inquiry strategy, PAQ associates an introspection strategy with a persistent query, that evaluates the quality of the query's results. The result of introspection can trigger application-defined adaptation strategies that alter the nature of the query. PAQ's simple API makes developing adaptive querying systems easily realizable. We present the key abstractions, describe their implementations, and demonstrate the middleware's usefulness through application examples and evaluation.

  3. Distributed Query Plan Generation Using Multiobjective Genetic Algorithm

    PubMed Central

    Panicker, Shina; Vijay Kumar, T. V.

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513

  4. VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans

    NASA Astrophysics Data System (ADS)

    Wang, Song; Gupta, Chetan; Mehta, Abhay

    There are data streams all around us that can be harnessed for tremendous business and personal advantage. For an enterprise-level stream processing system such as CHAOS [1] (Continuous, Heterogeneous Analytic Over Streams), handling of complex query plans with resource constraints is challenging. While several scheduling strategies exist for stream processing, efficient scheduling of complex DAG query plans is still largely unsolved. In this paper, we propose a novel execution scheme for scheduling complex directed acyclic graph (DAG) query plans with meta-data enriched stream tuples. Our solution, called Virtual Pipelined Chain (or VPipe Chain for short), effectively extends the "Chain" pipelining scheduling approach to complex DAG query plans.

  5. Optimization of the Controlled Evaluation of Closed Relational Queries

    NASA Astrophysics Data System (ADS)

    Biskup, Joachim; Lochner, Jan-Hendrik; Sonntag, Sebastian

    For relational databases, controlled query evaluation is an effective inference control mechanism preserving confidentiality regarding a previously declared confidentiality policy. Implementations of controlled query evaluation usually lack efficiency due to costly theorem prover calls. Suitably constrained controlled query evaluation can be implemented efficiently, but is not flexible enough from the perspective of database users and security administrators. In this paper, we propose an optimized framework for controlled query evaluation in relational databases, being efficiently implementable on the one hand and relaxing the constraints of previous approaches on the other hand.

  6. AQBE — QBE Style Queries for Archetyped Data

    NASA Astrophysics Data System (ADS)

    Sachdeva, Shelly; Yaginuma, Daigo; Chu, Wanming; Bhalla, Subhash

    Large-scale adoption of electronic healthcare applications requires semantic interoperability. The new proposals propose an advanced (multi-level) DBMS architecture for repository services for health records of patients. These also require query interfaces at multiple levels and at the level of semi-skilled users. In this regard, a high-level user interface for querying the new form of standardized Electronic Health Records system has been examined in this study. It proposes a step-by-step graphical query interface to allow semi-skilled users to write queries. Its aim is to decrease user effort and communication ambiguities, and increase user friendliness.

  7. Google Search Tips

    Science.gov Websites

    with Search To search for a document, type a few descriptive words in the search box, and press the Enter key or click the search button. A results page appears with a list of documents and web pages that are related to your search terms, with the most relevant search results appearing at the top of the

  8. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories

    PubMed Central

    Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

    2017-01-01

    To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution. PMID:29854239

  9. Facilitating Cohort Discovery by Enhancing Ontology