Science.gov

Sample records for search query terms

  1. Categorical and Specificity Differences between User-Supplied Tags and Search Query Terms for Images. An Analysis of "Flickr" Tags and Web Image Search Queries

    ERIC Educational Resources Information Center

    Chung, EunKyung; Yoon, JungWon

    2009-01-01

    Introduction: The purpose of this study is to compare characteristics and features of user supplied tags and search query terms for images on the "Flickr" Website in terms of categories of pictorial meanings and level of term specificity. Method: This study focuses on comparisons between tags and search queries using Shatford's categorization…

  2. Exploration of Web Users' Search Interests through Automatic Subject Categorization of Query Terms.

    ERIC Educational Resources Information Center

    Pu, Hsiao-tieh; Yang, Chyan; Chuang, Shui-Lung

    2001-01-01

    Proposes a mechanism that carefully integrates human and machine efforts to explore Web users' search interests. The approach consists of a four-step process: extraction of core terms; construction of subject taxonomy; automatic subject categorization of query terms; and observation of users' search interests. Research findings are proved valuable…

  3. MeSH Speller + askMEDLINE: auto-completes MeSH terms then searches MEDLINE/PubMed via free-text, natural language queries.

    PubMed

    Fontelo, Paul; Liu, Fang; Ackerman, Michael

    2005-01-01

    Medical terminology is challenging even for healthcare personnel. Spelling errors can make searching MEDLINE/PubMed ineffective. We developed a utility that provides MeSH term and Specialist Lexicon Vocabulary suggestions as it is typed on a search page. The correctly spelled term can be incorporated into a free-text, natural language search or used as a clinical queries search.

  4. Searching the Web: The Public and Their Queries.

    ERIC Educational Resources Information Center

    Spink, Amanda; Wolfram, Dietmar; Jansen, Major B. J.; Saracevic, Tefko

    2001-01-01

    Reports findings from a study of searching behavior by over 200,000 users of the Excite search engine. Analysis of over one million queries revealed most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. Concludes that Web searching by the public differs significantly from searching of…

  5. Leveraging User Query Sessions to Improve Searching of Medical Literature

    PubMed Central

    Cheng, Shiwen; Hristidis, Vagelis; Weiner, Michael

    2013-01-01

    Published reports about searching medical literature do not refer to leveraging the query context, as expressed by previous queries in a session. We aimed to assess novel strategies for context-aware searching, hypothesizing that this would be better than baseline. Building upon methods using term frequency-inverse document frequency, we added extensions such as a function incorporating search results and terms of previous queries, with higher weights for more recent queries. Among 60 medical students generating queries against the TREC 9 benchmark dataset, we assessed recall and mean average precision. For difficult queries, we achieved improvement (27%) in average precision over baseline. Improvements in recall were also seen. Our methods outperformed baseline by 4% to 14% on average. Furthermore, the effectiveness of context-aware search was greater for longer query sessions, which are typically more challenging. In conclusion, leveraging the previous queries in a session improved overall search quality with this biomedical database. PMID:24551332

  6. EquiX-A Search and Query Language for XML.

    ERIC Educational Resources Information Center

    Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

    2002-01-01

    Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)

  7. Improving Web Search for Difficult Queries

    ERIC Educational Resources Information Center

    Wang, Xuanhui

    2009-01-01

    Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…

  8. How Do Children Reformulate Their Search Queries?

    ERIC Educational Resources Information Center

    Rutter, Sophie; Ford, Nigel; Clough, Paul

    2015-01-01

    Introduction: This paper investigates techniques used by children in year 4 (age eight to nine) of a UK primary school to reformulate their queries, and how they use information retrieval systems to support query reformulation. Method: An in-depth study analysing the interactions of twelve children carrying out search tasks in a primary school…

  9. A novel adaptive Cuckoo search for optimal query plan generation.

    PubMed

    Gomathi, Ramalingam; Sharmila, Dhandapani

    2014-01-01

    The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.

  10. Searching for Images: The Analysis of Users' Queries for Image Retrieval in American History.

    ERIC Educational Resources Information Center

    Choi, Youngok; Rasmussen, Edie M.

    2003-01-01

    Studied users' queries for visual information in American history to identify the image attributes important for retrieval and the characteristics of users' queries for digital images, based on queries from 38 faculty and graduate students. Results of pre- and post-test questionnaires and interviews suggest principle categories of search terms.…

  11. Monitoring a toxicological outbreak using Internet search query data.

    PubMed

    Yin, Shan; Ho, Mona

    2012-11-01

    A novel group of drugs of abuse colloquially known as "bath salts" had a dramatic rise in exposures recently noted in Europe and the United States. Internet search query data have been shown to be correlated with office visits for influenza-like illnesses. The purpose of this study was to determine whether internet search query data could have been used as a surveillance method for this outbreak. This was a retrospective database review of the National Poison Database System and internet search query data provided by Google Insights for Search (GIS) comparing exposures reported to "bath salts" with internet searches for "bath salts". 1072 cases of exposures to "bath salts" were reported to US poison centers from 7/1/10 to 2/28/11. GIS data for the search term "bath salts" had a correlation of 0.84 with exposures to bath salts reported to US poison centers over the study period. Poison center exposures and GIS data did not differ significantly in detecting a change from the baseline (p = 0.85). When comparing exposures by state to search volumes by state for "bath salts", the correlation was 0.79. Symptoms and treatments were typical of an exposure to a sympathomimetic drug. Internet search data correlated very well with exposures reported to US poison centers for a novel drug of abuse. In this particular outbreak, it is possible that using internet search data may have provided a means for public health officials to monitor the rise in usage on a national and regional basis.

  12. Cumulative Query Method for Influenza Surveillance Using Search Engine Data

    PubMed Central

    Seo, Dong-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il

    2014-01-01

    Background Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. Objectives The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Methods Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson’s correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. Results In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Conclusions Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation

  13. Cumulative query method for influenza surveillance using search engine data.

    PubMed

    Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il

    2014-12-16

    Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.

  14. Matching health information seekers' queries to medical terms

    PubMed Central

    2012-01-01

    Background The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. Methods In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. Results According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. Conclusions Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its

  15. Searching for cancer information on the internet: analyzing natural language search queries.

    PubMed

    Bader, Judith L; Theofanos, Mary Frances

    2003-12-11

    Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary

  16. Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries

    PubMed Central

    Theofanos, Mary Frances

    2003-01-01

    Background Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. Objective To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11

  17. The challenge of negation in health care searches and queries.

    PubMed

    Harvey, Valerie J; Ruzich, Constance M; Baugh, Jeanne M; Johnston, Bruce; Grant, Arthur J

    2003-01-01

    This poster deals with exclusionary queries implemented using the database language SQL and the VA FileMan database system and the retrieval searches involving negated concepts in medical narratives. The poster describes and presents error patterns and designing database queries, underlying comprehension issues regarding negative statements and queries, strategies and software for avoiding false positives in searches, and makes practical recommendations on identifying potential sources of error and avoiding incorrect or misleading results.

  18. Project Lefty: More Bang for the Search Query

    ERIC Educational Resources Information Center

    Varnum, Ken

    2010-01-01

    This article describes the Project Lefty, a search system that, at a minimum, adds a layer on top of traditional federated search tools that will make the wait for results more worthwhile for researchers. At best, Project Lefty improves search queries and relevance rankings for web-scale discovery tools to make the results themselves more relevant…

  19. Query Log Analysis of an Electronic Health Record Search Engine

    PubMed Central

    Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.

    2011-01-01

    We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150

  20. Quantum search with multiple walk steps per oracle query

    NASA Astrophysics Data System (ADS)

    Wong, Thomas G.; Ambainis, Andris

    2015-08-01

    We identify a key difference between quantum search by discrete- and continuous-time quantum walks: a discrete-time walk typically performs one walk step per oracle query, whereas a continuous-time walk can effectively perform multiple walk steps per query while only counting query time. As a result, we show that continuous-time quantum walks can outperform their discrete-time counterparts, even though both achieve quadratic speedups over their corresponding classical random walks. To provide greater equity, we allow the discrete-time quantum walk to also take multiple walk steps per oracle query while only counting queries. Then it matches the continuous-time algorithm's runtime, but such that it is a cubic speedup over its corresponding classical random walk. This yields a greater-than-quadratic speedup for quantum search over its corresponding classical random walk.

  1. Using Hybrid Search and Query for E-discovery Identification

    NASA Astrophysics Data System (ADS)

    Grosvenor, Dave; Seaborne, Andy

    We investigated the use of a hybrid search and query for locating enterprise data relevant to a requesting party's legal case (e-discovery identification). We extended the query capabilities of SPARQL with search capabilities to provide integrated access to structured, semi-structured and unstructured data sources. Every data source in the enterprise is potentially within the scope of e-discovery identification. So we use some common enterprise structured data sources that provide product and organizational information to guide the search and restrict it to a manageable scale. We use hybrid search and query to conduct a rich high-level search, which identifies the key people and products to coarsely locate relevant data-sources. Furthermore the product and organizational data sources are also used to increase recall which is a key requirement for e-discovery Identification.

  2. Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.

    PubMed

    De-Arteaga, Maria; Eggel, Ivan; Kahn, Charles E; Müller, Henning

    2015-10-01

    Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.

  3. Capturing the Meaning of Internet Search Queries by Taxonomy Mapping

    NASA Astrophysics Data System (ADS)

    Tikk, Domonkos; Kardkovács, Zsolt T.; Bánsághi, Zoltán

    Capturing the meaning of internet search queries can significantly improve the effectiveness of search retrieval. Users often have problem to find relevant answer to their queries, particularly, when the posted query is ambiguous. The orientation of the user can be greatly facilitated, if answers are grouped into topics of a fixed subject taxonomy. In this manner, the original problem can be transformed to the labelling of queries — and consequently, the answers — with the topic names. Thus the original problem is transformed into a classification set-up. This paper introduces our Ferrety algorithm that performs topic assignment, which also works when there is no directly available training data that describes the semantics of the subject taxonomy. The approach is presented via the example of ACM KDD Cup 2005 problem, where Ferrety was awarded for precision and creativity.

  4. Predicting Drug Recalls From Internet Search Engine Queries.

    PubMed

    Yom-Tov, Elad

    2017-01-01

    Batches of pharmaceuticals are sometimes recalled from the market when a safety issue or a defect is detected in specific production runs of a drug. Such problems are usually detected when patients or healthcare providers report abnormalities to medical authorities. Here, we test the hypothesis that defective production lots can be detected earlier by monitoring queries to Internet search engines. We extracted queries from the USA to the Bing search engine, which mentioned one of the 5195 pharmaceutical drugs during 2015 and all recall notifications issued by the Food and Drug Administration (FDA) during that year. By using attributes that quantify the change in query volume at the state level, we attempted to predict if a recall of a specific drug will be ordered by FDA in a time horizon ranging from 1 to 40 days in future. Our results show that future drug recalls can indeed be identified with an AUC of 0.791 and a lift at 5% of approximately 6 when predicting a recall occurring one day ahead. This performance degrades as prediction is made for longer periods ahead. The most indicative attributes for prediction are sudden spikes in query volume about a specific medicine in each state. Recalls of prescription drugs and those estimated to be of medium-risk are more likely to be identified using search query data. These findings suggest that aggregated Internet search engine data can be used to facilitate in early warning of faulty batches of medicines.

  5. Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea.

    PubMed

    Shin, Soo-Yong; Kim, Taerim; Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

    2016-01-01

    Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman's correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary.

  6. Index Compression and Efficient Query Processing in Large Web Search Engines

    ERIC Educational Resources Information Center

    Ding, Shuai

    2013-01-01

    The inverted index is the main data structure used by all the major search engines. Search engines build an inverted index on their collection to speed up query processing. As the size of the web grows, the length of the inverted list structures, which can easily grow to hundreds of MBs or even GBs for common terms (roughly linear in the size of…

  7. Improving search over Electronic Health Records using UMLS-based query expansion through random walks.

    PubMed

    Martinez, David; Otegi, Arantxa; Soroa, Aitor; Agirre, Eneko

    2014-10-01

    Most of the information in Electronic Health Records (EHRs) is represented in free textual form. Practitioners searching EHRs need to phrase their queries carefully, as the record might use synonyms or other related words. In this paper we show that an automatic query expansion method based on the Unified Medicine Language System (UMLS) Metathesaurus improves the results of a robust baseline when searching EHRs. The method uses a graph representation of the lexical units, concepts and relations in the UMLS Metathesaurus. It is based on random walks over the graph, which start on the query terms. Random walks are a well-studied discipline in both Web and Knowledge Base datasets. Our experiments over the TREC Medical Record track show improvements in both the 2011 and 2012 datasets over a strong baseline. Our analysis shows that the success of our method is due to the automatic expansion of the query with extra terms, even when they are not directly related in the UMLS Metathesaurus. The terms added in the expansion go beyond simple synonyms, and also add other kinds of topically related terms. Expansion of queries using related terms in the UMLS Metathesaurus beyond synonymy is an effective way to overcome the gap between query and document vocabularies when searching for patient cohorts. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge.

    PubMed

    Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J

    2016-08-02

    Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.

  9. Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search.

    PubMed

    Liu, Xianglong; Deng, Cheng; Lang, Bo; Tao, Dacheng; Li, Xuelong

    2016-02-01

    Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significantly outperform both

  10. Query-Dependent Banding (QDB) for Faster RNA Similarity Searches

    PubMed Central

    Nawrocki, Eric P; Eddy, Sean R

    2007-01-01

    When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs) are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB), which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN 2.4 to LN 1.3 for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization. PMID:17397253

  11. Search Term Reports

    EPA Pesticide Factsheets

    Learn what search terms brought users to choose your page in their search results, and what terms they entered in the EPA search box after visiting your page. Use this information to improve links and content on the page.

  12. RadSearch: a RIS/PACS integrated query tool

    NASA Astrophysics Data System (ADS)

    Tsao, Sinchai; Documet, Jorge; Moin, Paymann; Wang, Kevin; Liu, Brent J.

    2008-03-01

    Radiology Information Systems (RIS) contain a wealth of information that can be used for research, education, and practice management. However, the sheer amount of information available makes querying specific data difficult and time consuming. Previous work has shown that a clinical RIS database and its RIS text reports can be extracted, duplicated and indexed for searches while complying with HIPAA and IRB requirements. This project's intent is to provide a software tool, the RadSearch Toolkit, to allow intelligent indexing and parsing of RIS reports for easy yet powerful searches. In addition, the project aims to seamlessly query and retrieve associated images from the Picture Archiving and Communication System (PACS) in situations where an integrated RIS/PACS is in place - even subselecting individual series, such as in an MRI study. RadSearch's application of simple text parsing techniques to index text-based radiology reports will allow the search engine to quickly return relevant results. This powerful combination will be useful in both private practice and academic settings; administrators can easily obtain complex practice management information such as referral patterns; researchers can conduct retrospective studies with specific, multiple criteria; teaching institutions can quickly and effectively create thorough teaching files.

  13. Web Search Queries Can Predict Stock Market Volumes

    PubMed Central

    Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

    2012-01-01

    We live in a computerized and networked society where many of our actions leave a digital trace and affect other people’s actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www. PMID:22829871

  14. Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.

    PubMed

    Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai

    2017-03-01

    The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.

  15. A Study on Pubmed Search Tag Usage Pattern: Association Rule Mining of a Full-day Pubmed Query Log

    PubMed Central

    2013-01-01

    Background The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. Methods A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. Results The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. Conclusions The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed’s Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE. PMID:23302604

  16. Quantum Computers Can Search Arbitrarily Large Databases by a Single Query

    NASA Astrophysics Data System (ADS)

    Grover, Lov K.

    1997-12-01

    This paper shows that a quantum mechanical algorithm that can query information relating to multiple items of the database can search a database for a unique item satisfying a given condition, in a single query [a query is defined as any question to the database to which the database has to return a (yes/no answer]. A classical algorithm will be limited to the information theoretic bound of at least 2N queries, which it would achieve by using a binary search.

  17. From health search to healthcare: explorations of intention and utilization via query logs and user surveys.

    PubMed

    White, Ryen W; Horvitz, Eric

    2014-01-01

    To better understand the relationship between online health-seeking behaviors and in-world healthcare utilization (HU) by studies of online search and access activities before and after queries that pursue medical professionals and facilities. We analyzed data collected from logs of online searches gathered from consenting users of a browser toolbar from Microsoft (N=9740). We employed a complementary survey (N=489) to seek a deeper understanding of information-gathering, reflection, and action on the pursuit of professional healthcare. We provide insights about HU through the survey, breaking out its findings by different respondent marginalizations as appropriate. Observations made from search logs may be explained by trends observed in our survey responses, even though the user populations differ. The results provide insights about how users decide if and when to utilize healthcare resources, and how online health information seeking transitions to in-world HU. The findings from both the survey and the logs reveal behavioral patterns and suggest a strong relationship between search behavior and HU. Although the diversity of our survey respondents is limited and we cannot be certain that users visited medical facilities, we demonstrate that it may be possible to infer HU from long-term search behavior by the apparent influence that health concerns and professional advice have on search activity. Our findings highlight different phases of online activities around queries pursuing professional healthcare facilities and services. We also show that it may be possible to infer HU from logs without tracking people's physical location, based on the effect of HU on pre- and post-HU search behavior. This allows search providers and others to develop more robust models of interests and preferences by modeling utilization rather than simply the intention to utilize that is expressed in search queries.

  18. From health search to healthcare: explorations of intention and utilization via query logs and user surveys

    PubMed Central

    White, Ryen W; Horvitz, Eric

    2014-01-01

    Objective To better understand the relationship between online health-seeking behaviors and in-world healthcare utilization (HU) by studies of online search and access activities before and after queries that pursue medical professionals and facilities. Materials and methods We analyzed data collected from logs of online searches gathered from consenting users of a browser toolbar from Microsoft (N=9740). We employed a complementary survey (N=489) to seek a deeper understanding of information-gathering, reflection, and action on the pursuit of professional healthcare. Results We provide insights about HU through the survey, breaking out its findings by different respondent marginalizations as appropriate. Observations made from search logs may be explained by trends observed in our survey responses, even though the user populations differ. Discussion The results provide insights about how users decide if and when to utilize healthcare resources, and how online health information seeking transitions to in-world HU. The findings from both the survey and the logs reveal behavioral patterns and suggest a strong relationship between search behavior and HU. Although the diversity of our survey respondents is limited and we cannot be certain that users visited medical facilities, we demonstrate that it may be possible to infer HU from long-term search behavior by the apparent influence that health concerns and professional advice have on search activity. Conclusions Our findings highlight different phases of online activities around queries pursuing professional healthcare facilities and services. We also show that it may be possible to infer HU from logs without tracking people's physical location, based on the effect of HU on pre- and post-HU search behavior. This allows search providers and others to develop more robust models of interests and preferences by modeling utilization rather than simply the intention to utilize that is expressed in search queries. PMID

  19. [On the seasonality of dermatoses: a retrospective analysis of search engine query data depending on the season].

    PubMed

    Köhler, M J; Springer, S; Kaatz, M

    2014-09-01

    The volume of search engine queries about disease-relevant items reflects public interest and correlates with disease prevalence as proven by the example of flu (influenza). Other influences include media attention or holidays. The present work investigates if the seasonality of prevalence or symptom severity of dermatoses correlates with search engine query data. The relative weekly volume of dermatological relevant search terms was assessed by the online tool Google Trends for the years 2009-2013. For each item, the degree of seasonality was calculated via frequency analysis and a geometric approach. Many dermatoses show a marked seasonality, reflected by search engine query volumes. Unexpected seasonal variations of these queries suggest a previously unknown variability of the respective disease prevalence. Furthermore, using the example of allergic rhinitis, a close correlation of search engine query data with actual pollen count can be demonstrated. In many cases, search engine query data are appropriate to estimate seasonal variability in prevalence of common dermatoses. This finding may be useful for real-time analysis and formation of hypotheses concerning pathogenetic or symptom aggravating mechanisms and may thus contribute to improvement of diagnostics and prevention of skin diseases.

  20. Use of Internet Search Queries to Enhance Surveillance of Foodborne Illness.

    PubMed

    Bahk, Gyung Jin; Kim, Yong Soo; Park, Myoung Su

    2015-11-01

    As a supplement to or extension of methods used to determine trends in foodborne illness over time, we propose the use of Internet search metrics. We compared Internet query data for foodborne illness syndrome-related search terms from the most popular 5 Korean search engines using Health Insurance Review and Assessment Service inpatient stay data for 26 International Classification of Diseases, Tenth Revision, codes for foodborne illness in South Korea during 2010-2012. We used time-series analysis with Seasonal Autoregressive Integrated Moving Average (SARIMA) models. Internet search queries for "food poisoning" correlated most strongly with foodborne illness data (r=0.70, p<0.001); furthermore, "food poisoning" queries correlated most strongly with the total number of inpatient stays related to foodborne illness during the next month (β=0.069, SE 0.017, p<0.001). This approach, using the SARIMA model, could be used to effectively measure trends over time to enhance surveillance of foodborne illness in South Korea.

  1. A study of medical and health queries to web search engines.

    PubMed

    Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk

    2004-03-01

    This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.

  2. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

    PubMed Central

    Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-01-01

    Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In

  3. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.

    PubMed

    Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-07-04

    As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using

  4. Analysis of queries sent to PubMed at the point of care: observation of search behaviour in a medical teaching hospital.

    PubMed

    Hoogendam, Arjen; Stalenhoef, Anton F H; Robbé, Pieter F de Vries; Overbeke, A John P M

    2008-09-24

    The use of PubMed to answer daily medical care questions is limited because it is challenging to retrieve a small set of relevant articles and time is restricted. Knowing what aspects of queries are likely to retrieve relevant articles can increase the effectiveness of PubMed searches. The objectives of our study were to identify queries that are likely to retrieve relevant articles by relating PubMed search techniques and tools to the number of articles retrieved and the selection of articles for further reading. This was a prospective observational study of queries regarding patient-related problems sent to PubMed by residents and internists in internal medicine working in an Academic Medical Centre. We analyzed queries, search results, query tools (Mesh, Limits, wildcards, operators), selection of abstract and full-text for further reading, using a portal that mimics PubMed. PubMed was used to solve 1121 patient-related problems, resulting in 3205 distinct queries. Abstracts were viewed in 999 (31%) of these queries, and in 126 (39%) of 321 queries using query tools. The average term count per query was 2.5. Abstracts were selected in more than 40% of queries using four or five terms, increasing to 63% if the use of four or five terms yielded 2-161 articles. Queries sent to PubMed by physicians at our hospital during daily medical care contain fewer than three terms. Queries using four to five terms, retrieving less than 161 article titles, are most likely to result in abstract viewing. PubMed search tools are used infrequently by our population and are less effective than the use of four or five terms. Methods to facilitate the formulation of precise queries, using more relevant terms, should be the focus of education and research.

  5. Analysis of queries sent to PubMed at the point of care: Observation of search behaviour in a medical teaching hospital

    PubMed Central

    Hoogendam, Arjen; Stalenhoef, Anton FH; Robbé, Pieter F de Vries; Overbeke, A John PM

    2008-01-01

    Background The use of PubMed to answer daily medical care questions is limited because it is challenging to retrieve a small set of relevant articles and time is restricted. Knowing what aspects of queries are likely to retrieve relevant articles can increase the effectiveness of PubMed searches. The objectives of our study were to identify queries that are likely to retrieve relevant articles by relating PubMed search techniques and tools to the number of articles retrieved and the selection of articles for further reading. Methods This was a prospective observational study of queries regarding patient-related problems sent to PubMed by residents and internists in internal medicine working in an Academic Medical Centre. We analyzed queries, search results, query tools (Mesh, Limits, wildcards, operators), selection of abstract and full-text for further reading, using a portal that mimics PubMed. Results PubMed was used to solve 1121 patient-related problems, resulting in 3205 distinct queries. Abstracts were viewed in 999 (31%) of these queries, and in 126 (39%) of 321 queries using query tools. The average term count per query was 2.5. Abstracts were selected in more than 40% of queries using four or five terms, increasing to 63% if the use of four or five terms yielded 2–161 articles. Conclusion Queries sent to PubMed by physicians at our hospital during daily medical care contain fewer than three terms. Queries using four to five terms, retrieving less than 161 article titles, are most likely to result in abstract viewing. PubMed search tools are used infrequently by our population and are less effective than the use of four or five terms. Methods to facilitate the formulation of precise queries, using more relevant terms, should be the focus of education and research. PMID:18816391

  6. Query Classification and Study of University Students' Search Trends

    ERIC Educational Resources Information Center

    Maabreh, Majdi A.; Al-Kabi, Mohammed N.; Alsmadi, Izzat M.

    2012-01-01

    Purpose: This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet. Design/methodology/approach: The web log files were collected from one of the higher…

  7. Revisiting the Rise of Electronic Nicotine Delivery Systems Using Search Query Surveillance

    PubMed Central

    Ayers, John W.; Althouse, Benjamin M.; Allem, Jon-Patrick; Leas, Eric C.; Dredze, Mark; Williams, Rebecca

    2016-01-01

    Introduction Public perceptions of electronic nicotine delivery systems (ENDS) remain poorly understood because surveys are too costly to regularly implement and when implemented there are large delays between data collection and dissemination. Search query surveillance has bridged some of these gaps. Herein, ENDS’ popularity in the U.S. is reassessed using Google searches. Methods ENDS searches originating in the U.S. from January 2009 through January 2015 were disaggregated by terms focused on e-cigarette (e.g., e-cig) versus vaping (e.g., vapers), their geolocation (e.g., state), the aggregate tobacco control measures corresponding to their geolocation (e.g., clean indoor air laws), and by terms that indicated the searcher’s potential interest (e.g., buy e-cigs likely indicates shopping); all analyzed in 2015. Results ENDS searches are increasing across the entire U.S., with 8,498,180 searches during 2014. At the same time, searches shifted from e-cigarette- to vaping-focused terms, especially in coastal states and states with more anti-smoking norms. For example, nationally, e-cigarette searches declined 9% (95% CI=1%, 16%) during 2014 compared with 2013, whereas vaping searches increased 136% (95% CI=97%, 186%), surpassing e-cigarette searches. More ENDS searches were related to shopping (e.g., vape shop) than health concerns (e.g., vaping risks) or cessation (e.g., quit smoking with e-cigs), with shopping searches nearly doubling during 2014. Conclusions ENDS popularity is rapidly growing and evolving, and monitoring searches has provided these timely insights. These findings may inform survey questionnaire development for follow-up investigation and immediately guide policy debates about how the public perceives ENDS’ health risks or cessation benefits. PMID:26876772

  8. Revisiting the Rise of Electronic Nicotine Delivery Systems Using Search Query Surveillance.

    PubMed

    Ayers, John W; Althouse, Benjamin M; Allem, Jon-Patrick; Leas, Eric C; Dredze, Mark; Williams, Rebecca S

    2016-06-01

    Public perceptions of electronic nicotine delivery systems (ENDS) remain poorly understood because surveys are too costly to regularly implement and, when implemented, there are long delays between data collection and dissemination. Search query surveillance has bridged some of these gaps. Herein, ENDS' popularity in the U.S. is reassessed using Google searches. ENDS searches originating in the U.S. from January 2009 through January 2015 were disaggregated by terms focused on e-cigarette (e.g., e-cig) versus vaping (e.g., vapers); their geolocation (e.g., state); the aggregate tobacco control measures corresponding to their geolocation (e.g., clean indoor air laws); and by terms that indicated the searcher's potential interest (e.g., buy e-cigs likely indicates shopping)-all analyzed in 2015. ENDS searches are rapidly increasing in the U.S., with 8,498,000 searches during 2014 alone. Increasingly, searches are shifting from e-cigarette- to vaping-focused terms, especially in coastal states and states where anti-smoking norms are stronger. For example, nationally, e-cigarette searches declined 9% (95% CI=1%, 16%) during 2014 compared with 2013, whereas vaping searches increased 136% (95% CI=97%, 186%), even surpassing e-cigarette searches. Additionally, the percentage of ENDS searches related to shopping (e.g., vape shop) nearly doubled in 2014, whereas searches related to health concerns (e.g., vaping risks) or cessation (e.g., quit smoking with e-cigs) were rare and declined in 2014. ENDS popularity is rapidly growing and evolving. These findings could inform survey questionnaire development for follow-up investigation and immediately guide policy debates about how the public perceives the health risks or cessation benefits of ENDS. Copyright © 2016 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  9. Cognitive issues in searching images with visual queries

    NASA Astrophysics Data System (ADS)

    Yu, ByungGu; Evens, Martha W.

    1999-01-01

    In this paper, we propose our image indexing technique and visual query processing technique. Our mental images are different from the actual retinal images and many things, such as personal interests, personal experiences, perceptual context, the characteristics of spatial objects, and so on, affect our spatial perception. These private differences are propagated into our mental images and so our visual queries become different from the real images that we want to find. This is a hard problem and few people have tried to work on it. In this paper, we survey the human mental imagery system, the human spatial perception, and discuss several kinds of visual queries. Also, we propose our own approach to visual query interpretation and processing.

  10. Google Search Queries About Neurosurgical Topics: Are They a Suitable Guide for Neurosurgeons?

    PubMed

    Lawson McLean, Anna C; Lawson McLean, Aaron; Kalff, Rolf; Walter, Jan

    2016-06-01

    Google is the most popular search engine, with about 100 billion searches per month. Google Trends is an integrated tool that allows users to obtain Google's search popularity statistics from the last decade. Our aim was to evaluate whether Google Trends is a useful tool to assess the public's interest in specific neurosurgical topics. We evaluated Google Trends statistics for the neurosurgical search topic areas "hydrocephalus," "spinal stenosis," "concussion," "vestibular schwannoma," and "cerebral arteriovenous malformation." We compared these with bibliometric data from PubMed and epidemiologic data from the German Federal Monitoring Agency. In addition, we assessed Google users' search behavior for the search terms "glioblastoma" and "meningioma." Over the last 10 years, there has been an increasing interest in the topic "concussion" from Internet users in general and scientists. "Spinal stenosis," "concussion," and "vestibular schwannoma" are topics that are of special interest in high-income countries (eg, Germany), whereas "hydrocephalus" is a popular topic in low- and middle-income countries. The Google-defined top searches within these topic areas revealed more detail about people's interests (eg, "normal pressure hydrocephalus" or "football concussion" ranked among the most popular search queries within the corresponding topics). There was a similar volume of queries for "glioblastoma" and "meningioma." Google Trends is a useful source to elicit information about general trends in peoples' health interests and the role of different diseases across the world. The Internet presence of neurosurgical units and surgeons can be guided by online users' interests to achieve high-quality, professional-endorsed patient education. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Image search reranking with query-dependent click-based relevance feedback.

    PubMed

    Zhang, Yongdong; Yang, Xiaopeng; Mei, Tao

    2014-10-01

    Our goal is to boost text-based image search results via image reranking. There are diverse modalities (features) of images that we can leverage for reranking, however, the effects of different modalities are query-dependent. The primary challenge we face is how to fuse multiple modalities adaptively for different queries, which has often been overlooked in previous reranking research. Moreover, multimodality fusion without an understanding of the query is risky, and may lead to incorrect judgment in reranking. Therefore, to obtain the best fusion weights for the query, in this paper, we leverage click-through data, which can be viewed as an "implicit" user feedback and an effective means of understanding the query. A novel reranking algorithm, called click-based relevance feedback, is proposed. This algorithm emphasizes the successful use of click-through data for identifying user search intention, while leveraging multiple kernel learning algorithm to adaptively learn the query-dependent fusion weights for multiple modalities. We conduct experiments on a real-world data set collected from a commercial search engine with click-through data. Encouraging experimental results demonstrate that our proposed reranking approach can significantly improve the NDCG@10 of the initial search results by 11.62%, and can outperform several existing approaches for most kinds of queries, such as tail, middle, and top queries.

  12. System for Performing Single Query Searches of Heterogeneous and Dispersed Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A. (Inventor); Gurram, Mohana M. (Inventor); Knight, Christopher D. (Inventor); Okimura, Takeshi (Inventor); Tran, Vu Hoang (Inventor); Trinh, Anh Ngoc (Inventor)

    2017-01-01

    The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.

  13. Linguistic Aspects of Web Queries.

    ERIC Educational Resources Information Center

    Jansen, Bernard J.; Spink, Amanda; Pfaff, Anthony

    2000-01-01

    Discussion of terms and how they are used in queries in information retrieval focuses on a transaction log analysis of queries posed on an Internet search service that isolated basic query structure syntactic patterns. Describes a linguistic model that classified Web queries and suggests implications for information retrieval system design.…

  14. Towards computational improvement of DNA database indexing and short DNA query searching.

    PubMed

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-09-03

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.

  15. Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services.

    PubMed

    Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong

    2016-01-01

    Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider.

  16. Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services

    PubMed Central

    Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong

    2016-01-01

    Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider. PMID:27571421

  17. Seasonal trends in hypertension in Poland: evidence from Google search engine query data.

    PubMed

    Płatek, Anna E; Sierdziński, Janusz; Krzowski, Bartosz; Szymański, Filip M

    2018-01-03

    Various conditions including arterial hypertension exhibit seasonal trends in their occurrence and magnitude. Those trends correspond to an interest exhibited in the number of Internet searches for the specific conditions per month. The aim of the study was to show seasonal trends in the hypertension prevalence in Poland related to the data from Google Trends tool. Internet search engine query data were retrieved from Google Trends from January 2008 to November 2017. Data were calculated as monthly normalized search volume from the 9-year period. Data was presented for specific geographic regions, including Poland, USA, Australia and worldwide for the following search terms: "arterial hypertension (pol. nadciśnienie tętnicze)", "hypertension (pol. nadciśnienie)" and "hypertension medical condition". Seasonal effects were calculated using regression models and presented graphically. In Poland the search volume is the highest between November and May, while patients exhibit the smallest interest in hypertension during summer holidays (p < 0.05). Seasonal variations are comparable in USA representing a Northern hemisphere country, while in Australia (Southern hemisphere) they exhibit a contrary trend. In conclusion, hypertension is more likely to occur during winter months, which correlates with increased interest in searching phrase 'hypertension' in Google.

  18. Manually Classifying User Search Queries on an Academic Library Web Site

    ERIC Educational Resources Information Center

    Chapman, Suzanne; Desai, Shevon; Hagedorn, Kat; Varnum, Ken; Mishra, Sonali; Piacentine, Julie

    2013-01-01

    The University of Michigan Library wanted to learn more about the kinds of searches its users were conducting through the "one search" search box on the Library Web site. Library staff conducted two investigations. A preliminary investigation in 2011 involved the manual review of the 100 most frequently occurring queries conducted…

  19. Searching Databases without Query-Building Aids: Implications for Dyslexic Users

    ERIC Educational Resources Information Center

    Berget, Gerd; Sandnes, Frode Eika

    2015-01-01

    Introduction: Few studies document the information searching behaviour of users with cognitive impairments. This paper therefore addresses the effect of dyslexia on information searching in a database with no tolerance for spelling errors and no query-building aids. The purpose was to identify effective search interface design guidelines that…

  20. Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries.

    PubMed

    Yom-Tov, Elad; Lev-Ran, Shaul

    2017-10-26

    Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration's Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R 2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings.

  1. Search Help

    EPA Pesticide Factsheets

    Guidance and search help resource listing examples of common queries that can be used in the Google Search Appliance search request, including examples of special characters, or query term seperators that Google Search Appliance recognizes.

  2. Semantic Features for Classifying Referring Search Terms

    SciTech Connect

    May, Chandler J.; Henry, Michael J.; McGrath, Liam R.; Bell, Eric B.; Marshall, Eric J.; Gregory, Michelle L.

    2012-05-11

    When an internet user clicks on a result in a search engine, a request is submitted to the destination web server that includes a referrer field containing the search terms given by the user. Using this information, website owners can analyze the search terms leading to their websites to better understand their visitors needs. This work explores some of the features that can be used for classification-based analysis of such referring search terms. We present initial results for the example task of classifying HTTP requests countries of origin. A system that can accurately predict the country of origin from query text may be a valuable complement to IP lookup methods which are susceptible to the obfuscation of dereferrers or proxies. We suggest that the addition of semantic features improves classifier performance in this example application. We begin by looking at related work and presenting our approach. After describing initial experiments and results, we discuss paths forward for this work.

  3. Linking Annual Prescription Volume of Antidepressants to Corresponding Web Search Query Data: A Possible Proxy for Medical Prescription Behavior?

    PubMed

    Gahr, Maximilian; Uzelac, Zeljko; Zeiss, René; Connemann, Bernhard J; Lang, Dirk; Schönfeldt-Lecuona, Carlos

    2015-12-01

    Persons using the Internet to retrieve medical information generate large amounts of health-related data, which are increasingly used in modern health sciences. We analyzed the relation between annual prescription volumes (APVs) of several antidepressants with marketing approval in Germany and corresponding web search query data generated in Google to test whether web search query volume may be a proxy for medical prescription practice. We obtained APVs of several antidepressants related to corresponding prescriptions at the expense of the statutory health insurance in Germany from 2004 to 2013. Web search query data generated in Germany and related to defined search terms (active substance or brand name) were obtained with Google Trends. We calculated correlations (Person's r) between the APVs of each substance and the respective annual "search share" values; coefficients of determination (R) were computed to determine the amount of variability shared by the 2 variables. Significant and strong correlations between substance-specific APVs and corresponding annual query volumes were found for each substance during the observational interval: agomelatine (r = 0.968, R = 0.932, P = 0.01), bupropion (r = 0.962, R = 0.925, P = 0.01), citalopram (r = 0.970, R = 0.941, P = 0.01), escitalopram (r = 0.824, R = 0.682, P = 0.01), fluoxetine (r = 0.885, R = 0.783, P = 0.01), paroxetine (r = 0.801, R = 0.641, P = 0.01), and sertraline (r = 0.880, R = 0.689, P = 0.01). Although the used data did not allow to perform an analysis with a higher temporal resolution (quarters, months), our results suggest that web search query volume may be a proxy for corresponding prescription behavior. However, further studies analyzing other pharmacologic agents and prescription data that facilitate an increased temporal resolution are needed to confirm this hypothesis.

  4. A study on PubMed search tag usage pattern: association rule mining of a full-day PubMed query log.

    PubMed

    Mosa, Abu Saleh Mohammad; Yoo, Illhoi

    2013-01-09

    The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed's Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE.

  5. Design of an On-Line Query Language for Full Text Patent Search.

    ERIC Educational Resources Information Center

    Glantz, Richard S.

    The design of an English-like query language and an interactive computer environment for searching the full text of the U.S. patent collection are discussed. Special attention is paid to achieving a transparent user interface, to providing extremely broad search capabilities (including nested substitution classes, Kleene star events, and domain…

  6. Cross-Searching Subject Gateways: The Query Routing and Forward Knowledge Approach.

    ERIC Educational Resources Information Center

    Kirriemuir, John; Brickley, Dan; Welsh, Susan; Knight, Jon; Hamilton, Martin

    1998-01-01

    Describes characteristics of subject gateways on the Web (i.e., facilities allowing easier access to network-based resources in a defined subject area) and compares them to search engines (e.g., AltaVista). The application of WHOIS++, centroids, query routing, and forward knowledge to searching several gateways simultaneously is outlined, and…

  7. News trends and web search query of HIV/AIDS in Hong Kong.

    PubMed

    Chiu, Alice P Y; Lin, Qianying; He, Daihai

    2017-01-01

    The HIV epidemic in Hong Kong has worsened in recent years, with major contributions from high-risk subgroup of men who have sex with men (MSM). Internet use is prevalent among the majority of the local population, where they sought health information online. This study examines the impacts of HIV/AIDS and MSM news coverage on web search query in Hong Kong. Relevant news coverage about HIV/AIDS and MSM from January 1st, 2004 to December 31st, 2014 was obtained from the WiseNews databse. News trends were created by computing the number of relevant articles by type, topic, place of origin and sub-populations. We then obtained relevant search volumes from Google and analysed causality between news trends and Google Trends using Granger Causality test and orthogonal impulse function. We found that editorial news has an impact on "HIV" Google searches on HIV, with the search term popularity peaking at an average of two weeks after the news are published. Similarly, editorial news has an impact on the frequency of "AIDS" searches two weeks after. MSM-related news trends have a more fluctuating impact on "MSM" Google searches, although the time lag varies anywhere from one week later to ten weeks later. This infodemiological study shows that there is a positive impact of news trends on the online search behavior of HIV/AIDS or MSM-related issues for up to ten weeks after. Health promotional professionals could make use of this brief time window to tailor the timing of HIV awareness campaigns and public health interventions to maximise its reach and effectiveness.

  8. News trends and web search query of HIV/AIDS in Hong Kong

    PubMed Central

    Chiu, Alice P. Y.; Lin, Qianying

    2017-01-01

    Background The HIV epidemic in Hong Kong has worsened in recent years, with major contributions from high-risk subgroup of men who have sex with men (MSM). Internet use is prevalent among the majority of the local population, where they sought health information online. This study examines the impacts of HIV/AIDS and MSM news coverage on web search query in Hong Kong. Methods Relevant news coverage about HIV/AIDS and MSM from January 1st, 2004 to December 31st, 2014 was obtained from the WiseNews databse. News trends were created by computing the number of relevant articles by type, topic, place of origin and sub-populations. We then obtained relevant search volumes from Google and analysed causality between news trends and Google Trends using Granger Causality test and orthogonal impulse function. Results We found that editorial news has an impact on “HIV” Google searches on HIV, with the search term popularity peaking at an average of two weeks after the news are published. Similarly, editorial news has an impact on the frequency of “AIDS” searches two weeks after. MSM-related news trends have a more fluctuating impact on “MSM” Google searches, although the time lag varies anywhere from one week later to ten weeks later. Conclusions This infodemiological study shows that there is a positive impact of news trends on the online search behavior of HIV/AIDS or MSM-related issues for up to ten weeks after. Health promotional professionals could make use of this brief time window to tailor the timing of HIV awareness campaigns and public health interventions to maximise its reach and effectiveness. PMID:28922376

  9. Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions

    NASA Technical Reports Server (NTRS)

    Lynnes, Chris; Beaumont, Bruce; Duerr, Ruth; Hua, Hook

    2009-01-01

    This slide presentation reviews a Space-time query system that has been developed to assist the user in finding Earth science data that fulfills the researchers needs. It reviews the reasons why finding Earth science data can be so difficult, and explains the workings of the Space-Time Query with OpenSearch and how this system can assist researchers in finding the required data, It also reviews the developments with client server systems.

  10. Internet search query analysis can be used to demonstrate the rapidly increasing public awareness of palliative care in the USA.

    PubMed

    McLean, Sarah; Lennon, Paul; Glare, Paul

    2017-01-27

    A lack of public awareness of palliative care (PC) has been identified as one of the main barriers to appropriate PC access. Internet search query analysis is a novel methodology, which has been effectively used in surveillance of infectious diseases, and can be used to monitor public awareness of health-related topics. We aimed to demonstrate the utility of internet search query analysis to evaluate changes in public awareness of PC in the USA between 2005 and 2015. Google Trends provides a referenced score for the popularity of a search term, for defined regions over defined time periods. The popularity of the search term 'palliative care' was measured monthly between 1/1/2005 and 31/12/2015 in the USA and in the UK. Results were analysed using independent t-tests and joinpoint analysis. The mean monthly popularity of the search term increased between 2008-2009 (p<0.001), 2011-2012 (p<0.001), 2013-2014 (p=0.004) and 2014-2015 (p=0.002) in the USA. Joinpoint analysis was used to evaluate the monthly percentage change (MPC) in the popularity of the search term. In the USA, the MPC increase was 0.6%/month (p<0.05); in the UK the MPC of 0.05% was non-significant. Although internet search query surveillance is a novel methodology, it is freely accessible and has significant potential to monitor health-seeking behaviour among the public. PC is rapidly growing in the USA, and the rapidly increasing public awareness of PC as demonstrated in this study, in comparison with the UK, where PC is relatively well established is encouraging in increasingly ensuring appropriate PC access for all. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  11. Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions

    NASA Astrophysics Data System (ADS)

    Lynnes, C.; Beaumont, B.; Duerr, R. E.; Hua, H.

    2009-12-01

    The past decade has seen a burgeoning of remote sensing and Earth science data providers, as evidenced in the growth of the Earth Science Information Partner (ESIP) federation. At the same time, the need to combine diverse data sets to enable understanding of the Earth as a system has also grown. While the expansion of data providers is in general a boon to such studies, the diversity presents a challenge to finding useful data for a given study. Locating all the data files with aerosol information for a particular volcanic eruption, for example, may involve learning and using several different search tools to execute the requisite space-time queries. To address this issue, the ESIP federation is developing a federated space-time query framework, based on the OpenSearch convention (www.opensearch.org), with Geo and Time extensions. In this framework, data providers publish OpenSearch Description Documents that describe in a machine-readable form how to execute queries against the provider. The novelty of OpenSearch is that the space-time query interface becomes both machine callable and easy enough to integrate into the web browser's search box. This flexibility, together with a simple REST (HTTP-get) interface, should allow a variety of data providers to participate in the federated search framework, from large institutional data centers to individual scientists. The simple interface enables trivial querying of multiple data sources and participation in recursive-like federated searches--all using the same common OpenSearch interface. This simplicity also makes the construction of clients easy, as does existing OpenSearch client libraries in a variety of languages. Moreover, a number of clients and aggregation services already exist and OpenSearch is already supported by a number of web browsers such as Firefox and Internet Explorer.

  12. Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries

    PubMed Central

    Lev-Ran, Shaul

    2017-01-01

    Background Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Objective Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. Methods We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration’s Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Results Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). Conclusions These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. PMID:29074469

  13. Do economic equality and generalized trust inhibit academic dishonesty? Evidence from state-level search-engine queries.

    PubMed

    Neville, Lukas

    2012-04-01

    What effect does economic inequality have on academic integrity? Using data from search-engine queries made between 2003 and 2011 on Google and state-level measures of income inequality and generalized trust, I found that academically dishonest searches (queries seeking term-paper mills and help with cheating) were more likely to come from states with higher income inequality and lower levels of generalized trust. These relations persisted even when controlling for contextual variables, such as average income and the number of colleges per capita. The relation between income inequality and academic dishonesty was fully mediated by generalized trust. When there is higher economic inequality, people are less likely to view one another as trustworthy. This lower generalized trust, in turn, is associated with a greater prevalence of academic dishonesty. These results might explain previous findings on the effectiveness of honor codes.

  14. How to improve your PubMed/MEDLINE searches: 2. display settings, complex search queries and topic searching.

    PubMed

    Fatehi, Farhad; Gray, Leonard C; Wootton, Richard

    2014-01-01

    The way that PubMed results are displayed can be changed using the Display Settings drop-down menu in the result screen. There are three groups of options: Format, Items per page and Sort by, which allow a good deal of control. The results from several searches can be temporarily stored on the Clipboard. Records of interest can be selected on the results page using check boxes and can then be combined, for example to form a reference list. The Related Citations is a valuable feature of PubMed that can provide a set of similar articles when you have identified a record of interest among the results. You can easily search for RCTs or reviews using the appropriate filters or field tags. If you are interested in clinical articles, rather than basic science or health service research, then the Clinical Queries tool on the PubMed home page can be used to retrieve them.

  15. MEDLINE clinical queries are robust when searching in recent publishing years.

    PubMed

    Wilczynski, Nancy L; McKibbon, K Ann; Walter, Stephen D; Garg, Amit X; Haynes, R Brian

    2013-01-01

    To determine if the PubMed and Ovid MEDLINE clinical queries (which were developed in the publishing year 2000, for the purpose categories therapy, diagnosis, prognosis, etiology, and clinical prediction guides) perform as well when searching in current publishing years. A gold standard database of recently published research literature was created using the McMaster health knowledge refinery (http://hiru.mcmaster.ca/hiru/HIRU_McMaster_HKR.aspx) and its continuously updated database, McMaster PLUS (http://hiru.mcmaster.ca/hiru/HIRU_McMaster_PLUS_projects.aspx). This database contains articles from over 120 clinical journals that are tagged for meeting or not meeting criteria for scientific merit and clinical relevance. The clinical queries sensitive ('broad') and specific ('narrow') search filters were tested in this gold standard database, and sensitivity and specificity were calculated and compared with those originally reported for the clinical queries. In all cases, the sensitivity of the highly sensitive search filters and the specificity of the highly specific search filters did not differ substantively when comparing results derived in 2000 with those derived in a more current database. In addition, in all cases, the specificities for the highly sensitive search filters and the sensitivities for the highly specific search filters remained above 50% when testing them in the current database. These results are reassuring for modern-day searchers. The clinical queries that were derived in the year 2000 perform equally well a decade later. The PubMed and Ovid MEDLINE clinical queries have been revalidated and remain a useful public resource for searching the world's medical literature for research that is most relevant to clinical care.

  16. Effectively processing medical term queries on the UMLS Metathesaurus by layered dynamic programming.

    PubMed

    Ren, Kaiyu; Lai, Albert M; Mukhopadhyay, Aveek; Machiraju, Raghu; Huang, Kun; Xiang, Yang

    2014-01-01

    Mapping medical terms to standardized UMLS concepts is a basic step for leveraging biomedical texts in data management and analysis. However, available methods and tools have major limitations in handling queries over the UMLS Metathesaurus that contain inaccurate query terms, which frequently appear in real world applications. To provide a practical solution for this task, we propose a layered dynamic programming mapping (LDPMap) approach, which can efficiently handle these queries. LDPMap uses indexing and two layers of dynamic programming techniques to efficiently map a biomedical term to a UMLS concept. Our empirical study shows that LDPMap achieves much faster query speeds than LCS. In comparison to the UMLS Metathesaurus Browser and MetaMap, LDPMap is much more effective in querying the UMLS Metathesaurus for inaccurately spelled medical terms, long medical terms, and medical terms with special characters. These results demonstrate that LDPMap is an efficient and effective method for mapping medical terms to the UMLS Metathesaurus.

  17. Effectively processing medical term queries on the UMLS Metathesaurus by layered dynamic programming

    PubMed Central

    2014-01-01

    Background Mapping medical terms to standardized UMLS concepts is a basic step for leveraging biomedical texts in data management and analysis. However, available methods and tools have major limitations in handling queries over the UMLS Metathesaurus that contain inaccurate query terms, which frequently appear in real world applications. Methods To provide a practical solution for this task, we propose a layered dynamic programming mapping (LDPMap) approach, which can efficiently handle these queries. LDPMap uses indexing and two layers of dynamic programming techniques to efficiently map a biomedical term to a UMLS concept. Results Our empirical study shows that LDPMap achieves much faster query speeds than LCS. In comparison to the UMLS Metathesaurus Browser and MetaMap, LDPMap is much more effective in querying the UMLS Metathesaurus for inaccurately spelled medical terms, long medical terms, and medical terms with special characters. Conclusions These results demonstrate that LDPMap is an efficient and effective method for mapping medical terms to the UMLS Metathesaurus. PMID:25079259

  18. What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the internet.

    PubMed

    Eysenbach, G; Kohler, Ch

    2003-01-01

    While health information is often said to be the most sought after information on the web, empirical data on the actual frequency of health-related searches on the web are missing. In the present study we aimed to determine the prevalence of health-related searches on the web by analyzing search terms entered by people into popular search engines. We also made some preliminary attempts in qualitatively describing and classifying these searches. Occasional difficulties in determining what constitutes a "health-related" search led us to propose and validate a simple method to automatically classify a search string as "health-related". This method is based on determining the proportion of pages on the web containing the search string and the word "health", as a proportion of the total number of pages with the search string alone. Using human codings as gold standard we plotted a ROC curve and determined empirically that if this "co-occurance rate" is larger than 35%, the search string can be said to be health-related (sensitivity: 85.2%, specificity 80.4%). The results of our "human" codings of search queries determined that about 4.5% of all searches are "health-related". We estimate that globally a minimum of 6.75 Million health-related searches are being conducted on the web every day, which is roughly the same number of searches that have been conducted on the NLM Medlars system in 1996 in a full year.

  19. Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold

    PubMed Central

    Li, Weizhong; Lopez, Rodrigo

    2017-01-01

    Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999

  20. Query-seeded iterative sequence similarity searching improves selectivity 5-20-fold.

    PubMed

    Pearson, William R; Li, Weizhong; Lopez, Rodrigo

    2017-04-20

    Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Querying Event Sequences by Exact Match or Similarity Search: Design and Empirical Evaluation

    PubMed Central

    Wongsuphasawat, Krist; Plaisant, Catherine; Taieb-Maimon, Meirav; Shneiderman, Ben

    2012-01-01

    Specifying event sequence queries is challenging even for skilled computer professionals familiar with SQL. Most graphical user interfaces for database search use an exact match approach, which is often effective, but near misses may also be of interest. We describe a new similarity search interface, in which users specify a query by simply placing events on a blank timeline and retrieve a similarity-ranked list of results. Behind this user interface is a new similarity measure for event sequences which the users can customize by four decision criteria, enabling them to adjust the impact of missing, extra, or swapped events or the impact of time shifts. We describe a use case with Electronic Health Records based on our ongoing collaboration with hospital physicians. A controlled experiment with 18 participants compared exact match and similarity search interfaces. We report on the advantages and disadvantages of each interface and suggest a hybrid interface combining the best of both. PMID:22379286

  2. Infodemiology of status epilepticus: A systematic validation of the Google Trends-based search queries.

    PubMed

    Bragazzi, Nicola Luigi; Bacigaluppi, Susanna; Robba, Chiara; Nardone, Raffaele; Trinka, Eugen; Brigo, Francesco

    2016-02-01

    People increasingly use Google looking for health-related information. We previously demonstrated that in English-speaking countries most people use this search engine to obtain information on status epilepticus (SE) definition, types/subtypes, and treatment. Now, we aimed at providing a quantitative analysis of SE-related web queries. This analysis represents an advancement, with respect to what was already previously discussed, in that the Google Trends (GT) algorithm has been further refined and correlational analyses have been carried out to validate the GT-based query volumes. Google Trends-based SE-related query volumes were well correlated with information concerning causes and pharmacological and nonpharmacological treatments. Google Trends can provide both researchers and clinicians with data on realities and contexts that are generally overlooked and underexplored by classic epidemiology. In this way, GT can foster new epidemiological studies in the field and can complement traditional epidemiological tools. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Privacy-Preserving Location-Based Query Using Location Indexes and Parallel Searching in Distributed Networks

    PubMed Central

    Liu, Lei; Zhao, Jing

    2014-01-01

    An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users. PMID:24790579

  4. How popular is waterpipe tobacco smoking? Findings from internet search queries.

    PubMed

    Salloum, Ramzi G; Osman, Amira; Maziak, Wasim; Thrasher, James F

    2015-09-01

    Waterpipe tobacco smoking (WTS), a traditional tobacco consumption practice in the Middle East, is gaining popularity worldwide. Estimates of population-level interest in WTS over time are not documented. We assessed the popularity of WTS using World Wide Web search query results across four English-speaking countries. We analysed trends in Google search queries related to WTS, comparing these trends with those for electronic cigarettes between 2004 and 2013 in Australia, Canada, the UK and the USA. Weekly search volumes were reported as percentages relative to the week with the highest volume of searches. Web-based searches for WTS have increased steadily since 2004 in all four countries. Search volume for WTS was higher than for e-cigarettes in three of the four nations, with the highest volume in the USA. Online searches were primarily targeted at WTS products for home use, followed by searches for WTS cafés/lounges. Online demand for information on WTS-related products and venues is large and increasing. Given the rise in WTS popularity, increasing evidence of exposure-related harms, and relatively lax government regulation, WTS is a serious public health concern and could reach epidemic levels in Western societies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  5. Snapscreen: TV-stream frame search with projectively distorted and noisy query

    NASA Astrophysics Data System (ADS)

    Skoryukina, Natalya; Chernov, Timofey; Bulatov, Konstantin; Nikolaev, Dmitry P.; Arlazarov, Vladimir

    2017-03-01

    In this work we describe an approach to real-time image search in large databases robust to variety of query distortions such as lighting alterations, projective distortions or digital noise. The approach is based on the extraction of keypoints and their descriptors, random hierarchical clustering trees for preliminary search and RANSAC for refining search and result scoring. The algorithm is implemented in Snapscreen system which allows determining a TV-channel and a TV-show from a picture acquired with mobile device. The implementation is enhanced using preceding localization of screen region. Results for the real-world data with different modifications of the system are presented.

  6. Advances in nowcasting influenza-like illness rates using search query logs

    NASA Astrophysics Data System (ADS)

    Lampos, Vasileios; Miller, Andrew C.; Crossan, Steve; Stefansen, Christian

    2015-08-01

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  7. Advances in nowcasting influenza-like illness rates using search query logs

    PubMed Central

    Lampos, Vasileios; Miller, Andrew C.; Crossan, Steve; Stefansen, Christian

    2015-01-01

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012–13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance. PMID:26234783

  8. Advances in nowcasting influenza-like illness rates using search query logs.

    PubMed

    Lampos, Vasileios; Miller, Andrew C; Crossan, Steve; Stefansen, Christian

    2015-08-03

    User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.

  9. SeqWare Query Engine: storing and searching sequence data in the cloud

    PubMed Central

    2010-01-01

    interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981

  10. SeqWare Query Engine: storing and searching sequence data in the cloud.

    PubMed

    O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

    2010-12-21

    analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.

  11. Utility of Web search query data in testing theoretical assumptions about mephedrone.

    PubMed

    Kapitány-Fövény, Máté; Demetrovics, Zsolt

    2017-05-01

    With growing access to the Internet, people who use drugs and traffickers started to obtain information about novel psychoactive substances (NPS) via online platforms. This paper aims to analyze whether a decreasing Web interest in formerly banned substances-cocaine, heroin, and MDMA-and the legislative status of mephedrone predict Web interest about this NPS. Google Trends was used to measure changes of Web interest on cocaine, heroin, MDMA, and mephedrone. Google search results for mephedrone within the same time frame were analyzed and categorized. Web interest about classic drugs found to be more persistent. Regarding geographical distribution, location of Web searches for heroin and cocaine was less centralized. Illicit status of mephedrone was a negative predictor of its Web search query rates. The connection between mephedrone-related Web search rates and legislative status of this substance was significantly mediated by ecstasy-related Web search queries, the number of documentaries, and forum/blog entries about mephedrone. The results might provide support for the hypothesis that mephedrone's popularity was highly correlated with its legal status as well as it functioned as a potential substitute for MDMA. Google Trends was found to be a useful tool for testing theoretical assumptions about NPS. Copyright © 2017 John Wiley & Sons, Ltd.

  12. Characterizing the time-perspective of nations with search engine query data.

    PubMed

    Noguchi, Takao; Stewart, Neil; Olivola, Christopher Y; Moat, Helen Susannah; Preis, Tobias

    2014-01-01

    Vast quantities of data on human behavior are being created by our everyday internet usage. Building upon a recent study by Preis, Moat, Stanley, and Bishop (2012), we used search engine query data to construct measures of the time-perspective of nations, and tested these measures against per-capita gross domestic product (GDP). The results indicate that nations with higher per-capita GDP are more focused on the future and less on the past, and that when these nations do focus on the past, it is more likely to be the distant past. These results demonstrate the viability of using nation-level data to build psychological constructs.

  13. Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.

    PubMed

    Chan, Emily H; Sahai, Vikram; Conrad, Corrie; Brownstein, John S

    2011-05-01

    A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.

  14. Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance

    PubMed Central

    Chan, Emily H.; Sahai, Vikram; Conrad, Corrie; Brownstein, John S.

    2011-01-01

    Background A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Methodology/Principal Findings Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Conclusions/Significance Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance. PMID:21647308

  15. An end user evaluation of query formulation and results review tools in three medical meta-search engines.

    PubMed

    Leroy, Gondy; Xu, Jennifer; Chung, Wingyan; Eggers, Shauna; Chen, Hsinchun

    2007-01-01

    Retrieving sufficient relevant information online is difficult for many people because they use too few keywords to search and search engines do not provide many support tools. To further complicate the search, users often ignore support tools when available. Our goal is to evaluate in a realistic setting when users use support tools and how they perceive these tools. We compared three medical search engines with support tools that require more or less effort from users to form a query and evaluate results. We carried out an end user study with 23 users who were asked to find information, i.e., subtopics and supporting abstracts, for a given theme. We used a balanced within-subjects design and report on the effectiveness, efficiency and usability of the support tools from the end user perspective. We found significant differences in efficiency but did not find significant differences in effectiveness between the three search engines. Dynamic user support tools requiring less effort led to higher efficiency. Fewer searches were needed and more documents were found per search when both query reformulation and result review tools dynamically adjust to the user query. The query reformulation tool that provided a long list of keywords, dynamically adjusted to the user query, was used most often and led to more subtopics. As hypothesized, the dynamic result review tools were used more often and led to more subtopics than static ones. These results were corroborated by the usability questionnaires, which showed that support tools that dynamically optimize output were preferred.

  16. SAM: String-based sequence search algorithm for mitochondrial DNA database queries.

    PubMed

    Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther

    2011-03-01

    The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  17. SAM: String-based sequence search algorithm for mitochondrial DNA database queries

    PubMed Central

    Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther

    2011-01-01

    The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). PMID:21056022

  18. Fixed-point quantum search with an optimal number of queries.

    PubMed

    Yoder, Theodore J; Low, Guang Hao; Chuang, Isaac L

    2014-11-21

    Grover's quantum search and its generalization, quantum amplitude amplification, provide a quadratic advantage over classical algorithms for a diverse set of tasks but are tricky to use without knowing beforehand what fraction λ of the initial state is comprised of the target states. In contrast, fixed-point search algorithms need only a reliable lower bound on this fraction but, as a consequence, lose the very quadratic advantage that makes Grover's algorithm so appealing. Here we provide the first version of amplitude amplification that achieves fixed-point behavior without sacrificing the quantum speedup. Our result incorporates an adjustable bound on the failure probability and, for a given number of oracle queries, guarantees that this bound is satisfied over the broadest possible range of λ.

  19. Scoping review on search queries and social media for disease surveillance: a chronology of innovation.

    PubMed

    Bernardo, Theresa Marie; Rajic, Andrijana; Young, Ian; Robiadek, Katie; Pham, Mai T; Funk, Julie A

    2013-07-18

    The threat of a global pandemic posed by outbreaks of influenza H5N1 (1997) and Severe Acute Respiratory Syndrome (SARS, 2002), both diseases of zoonotic origin, provoked interest in improving early warning systems and reinforced the need for combining data from different sources. It led to the use of search query data from search engines such as Google and Yahoo! as an indicator of when and where influenza was occurring. This methodology has subsequently been extended to other diseases and has led to experimentation with new types of social media for disease surveillance. The objective of this scoping review was to formally assess the current state of knowledge regarding the use of search queries and social media for disease surveillance in order to inform future work on early detection and more effective mitigation of the effects of foodborne illness. Structured scoping review methods were used to identify, characterize, and evaluate all published primary research, expert review, and commentary articles regarding the use of social media in surveillance of infectious diseases from 2002-2011. Thirty-two primary research articles and 19 reviews and case studies were identified as relevant. Most relevant citations were peer-reviewed journal articles (29/32, 91%) published in 2010-11 (28/32, 88%) and reported use of a Google program for surveillance of influenza. Only four primary research articles investigated social media in the context of foodborne disease or gastroenteritis. Most authors (21/32 articles, 66%) reported that social media-based surveillance had comparable performance when compared to an existing surveillance program. The most commonly reported strengths of social media surveillance programs included their effectiveness (21/32, 66%) and rapid detection of disease (21/32, 66%). The most commonly reported weaknesses were the potential for false positive (16/32, 50%) and false negative (11/32, 34%) results. Most authors (24/32, 75%) recommended that

  20. Scoping Review on Search Queries and Social Media for Disease Surveillance: A Chronology of Innovation

    PubMed Central

    Rajic, Andrijana; Young, Ian; Robiadek, Katie; Pham, Mai T; Funk, Julie A

    2013-01-01

    Background The threat of a global pandemic posed by outbreaks of influenza H5N1 (1997) and Severe Acute Respiratory Syndrome (SARS, 2002), both diseases of zoonotic origin, provoked interest in improving early warning systems and reinforced the need for combining data from different sources. It led to the use of search query data from search engines such as Google and Yahoo! as an indicator of when and where influenza was occurring. This methodology has subsequently been extended to other diseases and has led to experimentation with new types of social media for disease surveillance. Objective The objective of this scoping review was to formally assess the current state of knowledge regarding the use of search queries and social media for disease surveillance in order to inform future work on early detection and more effective mitigation of the effects of foodborne illness. Methods Structured scoping review methods were used to identify, characterize, and evaluate all published primary research, expert review, and commentary articles regarding the use of social media in surveillance of infectious diseases from 2002-2011. Results Thirty-two primary research articles and 19 reviews and case studies were identified as relevant. Most relevant citations were peer-reviewed journal articles (29/32, 91%) published in 2010-11 (28/32, 88%) and reported use of a Google program for surveillance of influenza. Only four primary research articles investigated social media in the context of foodborne disease or gastroenteritis. Most authors (21/32 articles, 66%) reported that social media-based surveillance had comparable performance when compared to an existing surveillance program. The most commonly reported strengths of social media surveillance programs included their effectiveness (21/32, 66%) and rapid detection of disease (21/32, 66%). The most commonly reported weaknesses were the potential for false positive (16/32, 50%) and false negative (11/32, 34%) results. Most

  1. An Analysis of Internet Search Engines: Assessment of Over 200 Search Queries.

    ERIC Educational Resources Information Center

    Tomaiuolo, Nicholas G.; Packer, Joan G.

    1996-01-01

    Describes a study of the retrieval results of World Wide Web search engines. Research quantified accurate matches versus matches of arguable quality for 200 subjects relevant to undergraduate curricula. Both "evaluative" engines (Magellan, Point Communications) and "nonevaluative" engines (Lycos, InfoSeek, AltaVista) were…

  2. Forecasting influenza in Hong Kong with Google search queries and statistical model fusion

    PubMed Central

    Ramirez Ramirez, L. Leticia; Nezafati, Kusha; Zhang, Qingpeng; Tsui, Kwok-Leung

    2017-01-01

    Background The objective of this study is to investigate predictive utility of online social media and web search queries, particularly, Google search data, to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement (i.e., fickle media interest) and other artifacts of online social media data, in our approach we fuse multiple offline and online data sources. Methods Four individual models: generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), autoregressive integrated moving average (ARIMA), and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. To our knowledge, this is the first study that introduces deep learning methodology into surveillance of infectious diseases and investigates its predictive utility. Furthermore, to exploit the strength from each individual forecasting models, we use statistical model fusion, using Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. For each model, an adaptive approach is used to capture the recent relationship between ILI and covariates. Results DL with FNN appears to deliver the most competitive predictive performance among the four considered individual models. Combing all four models in a comprehensive BMA framework allows to further improve such predictive evaluation metrics as root mean squared error (RMSE) and mean absolute predictive error (MAPE). Nevertheless, DL with FNN remains the preferred method for predicting locations of influenza peaks. Conclusions The proposed approach can be viewed a feasible alternative to forecast ILI in Hong Kong or other countries where ILI has no constant seasonal trend and influenza data resources are limited. The

  3. Forecasting influenza in Hong Kong with Google search queries and statistical model fusion.

    PubMed

    Xu, Qinneng; Gel, Yulia R; Ramirez Ramirez, L Leticia; Nezafati, Kusha; Zhang, Qingpeng; Tsui, Kwok-Leung

    2017-01-01

    The objective of this study is to investigate predictive utility of online social media and web search queries, particularly, Google search data, to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement (i.e., fickle media interest) and other artifacts of online social media data, in our approach we fuse multiple offline and online data sources. Four individual models: generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), autoregressive integrated moving average (ARIMA), and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. To our knowledge, this is the first study that introduces deep learning methodology into surveillance of infectious diseases and investigates its predictive utility. Furthermore, to exploit the strength from each individual forecasting models, we use statistical model fusion, using Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. For each model, an adaptive approach is used to capture the recent relationship between ILI and covariates. DL with FNN appears to deliver the most competitive predictive performance among the four considered individual models. Combing all four models in a comprehensive BMA framework allows to further improve such predictive evaluation metrics as root mean squared error (RMSE) and mean absolute predictive error (MAPE). Nevertheless, DL with FNN remains the preferred method for predicting locations of influenza peaks. The proposed approach can be viewed a feasible alternative to forecast ILI in Hong Kong or other countries where ILI has no constant seasonal trend and influenza data resources are limited. The proposed methodology is easily tractable

  4. Refining search terms for nanotechnology

    NASA Astrophysics Data System (ADS)

    Porter, Alan L.; Youtie, Jan; Shapira, Philip; Schoeneck, David J.

    2008-05-01

    The ability to delineate the boundaries of an emerging technology is central to obtaining an understanding of the technology's research paths and commercialization prospects. Nowhere is this more relevant than in the case of nanotechnology (hereafter identified as "nano") given its current rapid growth and multidisciplinary nature. (Under the rubric of nanotechnology, we also include nanoscience and nanoengineering.) Past efforts have utilized several strategies, including simple term search for the prefix nano, complex lexical and citation-based approaches, and bootstrapping techniques. This research introduces a modularized Boolean approach to defining nanotechnology which has been applied to several research and patenting databases. We explain our approach to downloading and cleaning data, and report initial results. Comparisons of this approach with other nanotechnology search formulations are presented. Implications for search strategy development and profiling of the nanotechnology field are discussed.

  5. Research on Web Search Behavior: How Online Query Data Inform Social Psychology.

    PubMed

    Lai, Kaisheng; Lee, Yan Xin; Chen, Hao; Yu, Rongjun

    2017-10-01

    The widespread use of web searches in daily life has allowed researchers to study people's online social and psychological behavior. Using web search data has advantages in terms of data objectivity, ecological validity, temporal resolution, and unique application value. This review integrates existing studies on web search data that have explored topics including sexual behavior, suicidal behavior, mental health, social prejudice, social inequality, public responses to policies, and other psychosocial issues. These studies are categorized as descriptive, correlational, inferential, predictive, and policy evaluation research. The integration of theory-based hypothesis testing in future web search research will result in even stronger contributions to social psychology.

  6. Automatic Concept-Based Query Expansion Using Term Relational Pathways Built from a Collection-Specific Association Thesaurus

    ERIC Educational Resources Information Center

    Lyall-Wilson, Jennifer Rae

    2013-01-01

    The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of…

  7. Examining the Relationship Between Past Orientation and US Suicide Rates: An Analysis Using Big Data-Driven Google Search Queries

    PubMed Central

    Lee, Donghyun; Lee, Hojun

    2016-01-01

    Background Internet search query data reflect the attitudes of the users, using which we can measure the past orientation to commit suicide. Examinations of past orientation often highlight certain predispositions of attitude, many of which can be suicide risk factors. Objective To investigate the relationship between past orientation and suicide rate by examining Google search queries. Methods We measured the past orientation using Google search query data by comparing the search volumes of the past year and those of the future year, across the 50 US states and the District of Columbia during the period from 2004 to 2012. We constructed a panel dataset with independent variables as control variables; we then undertook an analysis using multiple ordinary least squares regression and methods that leverage the Akaike information criterion and the Bayesian information criterion. Results It was found that past orientation had a positive relationship with the suicide rate (P≤.001) and that it improves the goodness-of-fit of the model regarding the suicide rate. Unemployment rate (P≤.001 in Models 3 and 4), Gini coefficient (P≤.001), and population growth rate (P≤.001) had a positive relationship with the suicide rate, whereas the gross state product (P≤.001) showed a negative relationship with the suicide rate. Conclusions We empirically identified the positive relationship between the suicide rate and past orientation, which was measured by big data-driven Google search query. PMID:26868917

  8. Examining the Relationship Between Past Orientation and US Suicide Rates: An Analysis Using Big Data-Driven Google Search Queries.

    PubMed

    Lee, Donghyun; Lee, Hojun; Choi, Munkee

    2016-02-11

    Internet search query data reflect the attitudes of the users, using which we can measure the past orientation to commit suicide. Examinations of past orientation often highlight certain predispositions of attitude, many of which can be suicide risk factors. To investigate the relationship between past orientation and suicide rate by examining Google search queries. We measured the past orientation using Google search query data by comparing the search volumes of the past year and those of the future year, across the 50 US states and the District of Columbia during the period from 2004 to 2012. We constructed a panel dataset with independent variables as control variables; we then undertook an analysis using multiple ordinary least squares regression and methods that leverage the Akaike information criterion and the Bayesian information criterion. It was found that past orientation had a positive relationship with the suicide rate (P ≤ .001) and that it improves the goodness-of-fit of the model regarding the suicide rate. Unemployment rate (P ≤ .001 in Models 3 and 4), Gini coefficient (P ≤ .001), and population growth rate (P ≤ .001) had a positive relationship with the suicide rate, whereas the gross state product (P ≤ .001) showed a negative relationship with the suicide rate. We empirically identified the positive relationship between the suicide rate and past orientation, which was measured by big data-driven Google search query.

  9. Bayesian ontology querying for accurate and noise-tolerant semantic searches

    PubMed Central

    Bauer, Sebastian; Robinson, Peter N.

    2012-01-01

    Motivation: Ontologies provide a structured representation of the concepts of a domain of knowledge as well as the relations between them. Attribute ontologies are used to describe the characteristics of the items of a domain, such as the functions of proteins or the signs and symptoms of disease, which opens the possibility of searching a database of items for the best match to a list of observed or desired attributes. However, naive search methods do not perform well on realistic data because of noise in the data, imprecision in typical queries and because individual items may not display all attributes of the category they belong to. Results:: We present a method for combining ontological analysis with Bayesian networks to deal with noise, imprecision and attribute frequencies and demonstrate an application of our method as a differential diagnostic support system for human genetics. Availability: We provide an implementation for the algorithm and the benchmark at http://compbio.charite.de/boqa/. Contact: Sebastian.Bauer@charite.de or Peter.Robinson@charite.de Supplementary Information: Supplementary Material for this article is available at Bioinformatics online. PMID:22843981

  10. Tracking Dabbing Using Search Query Surveillance: A Case Study in the United States.

    PubMed

    Zhang, Zhu; Zheng, Xiaolong; Zeng, Daniel Dajun; Leischow, Scott J

    2016-09-16

    Dabbing is an emerging method of marijuana ingestion. However, little is known about dabbing owing to limited surveillance data on dabbing. The aim of the study was to analyze Google search data to assess the scope and breadth of information seeking on dabbing. Google Trends data about dabbing and related topics (eg, electronic nicotine delivery system [ENDS], also known as e-cigarettes) in the United States between January 2004 and December 2015 were collected by using relevant search terms such as "dab rig." The correlation between dabbing (including topics: dab and hash oil) and ENDS (including topics: vaping and e-cigarette) searches, the regional distribution of dabbing searches, and the impact of cannabis legalization policies on geographical location in 2015 were analyzed. Searches regarding dabbing increased in the United States over time, with 1,526,280 estimated searches during 2015. Searches for dab and vaping have very similar temporal patterns, where the Pearson correlation coefficient (PCC) is .992 (P<.001). Similar phenomena were also obtained in searches for hash oil and e-cigarette, in which the corresponding PCC is .931 (P<.001). Dabbing information was searched more in some western states than other regions. The average dabbing searches were significantly higher in the states with medical and recreational marijuana legalization than in the states with only medical marijuana legalization (P=.02) or the states without medical and recreational marijuana legalization (P=.01). Public interest in dabbing is increasing in the United States. There are close associations between dabbing and ENDS searches. The findings suggest greater popularity of dabs in the states that legalized medical and recreational marijuana use. This study proposes a novel and timely way of cannabis surveillance, and these findings can help enhance the understanding of the popularity of dabbing and provide insights for future research and informed policy making on dabbing.

  11. Tracking Dabbing Using Search Query Surveillance: A Case Study in the United States

    PubMed Central

    Zhang, Zhu; Zeng, Daniel Dajun; Leischow, Scott J

    2016-01-01

    Background Dabbing is an emerging method of marijuana ingestion. However, little is known about dabbing owing to limited surveillance data on dabbing. Objective The aim of the study was to analyze Google search data to assess the scope and breadth of information seeking on dabbing. Methods Google Trends data about dabbing and related topics (eg, electronic nicotine delivery system [ENDS], also known as e-cigarettes) in the United States between January 2004 and December 2015 were collected by using relevant search terms such as “dab rig.” The correlation between dabbing (including topics: dab and hash oil) and ENDS (including topics: vaping and e-cigarette) searches, the regional distribution of dabbing searches, and the impact of cannabis legalization policies on geographical location in 2015 were analyzed. Results Searches regarding dabbing increased in the United States over time, with 1,526,280 estimated searches during 2015. Searches for dab and vaping have very similar temporal patterns, where the Pearson correlation coefficient (PCC) is .992 (P<.001). Similar phenomena were also obtained in searches for hash oil and e-cigarette, in which the corresponding PCC is .931 (P<.001). Dabbing information was searched more in some western states than other regions. The average dabbing searches were significantly higher in the states with medical and recreational marijuana legalization than in the states with only medical marijuana legalization (P=.02) or the states without medical and recreational marijuana legalization (P=.01). Conclusions Public interest in dabbing is increasing in the United States. There are close associations between dabbing and ENDS searches. The findings suggest greater popularity of dabs in the states that legalized medical and recreational marijuana use. This study proposes a novel and timely way of cannabis surveillance, and these findings can help enhance the understanding of the popularity of dabbing and provide insights for future

  12. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration

    PubMed Central

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-01

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. PMID:27733503

  13. Monitoring hand, foot and mouth disease by combining search engine query data and meteorological factors.

    PubMed

    Huang, Da-Cang; Wang, Jin-Feng

    2018-01-15

    Hand, foot and mouth disease (HFMD) has been recognized as a significant public health threat and poses a tremendous challenge to disease control departments. To date, the relationship between meteorological factors and HFMD has been documented, and public interest of disease has been proven to be trackable from the Internet. However, no study has explored the combination of these two factors in the monitoring of HFMD. Therefore, the main aim of this study was to develop an effective monitoring model of HFMD in Guangzhou, China by utilizing historical HFMD cases, Internet-based search engine query data and meteorological factors. To this end, a case study was conducted in Guangzhou, using a network-based generalized additive model (GAM) including all factors related to HFMD. Three other models were also constructed using some of the variables for comparison. The results suggested that the model showed the best estimating ability when considering all of the related factors. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Effective metadata discovery for dynamic filtering of queries to a radiology image search engine.

    PubMed

    Kahn, Charles E

    2008-09-01

    We sought to demonstrate the effectiveness of techniques to index radiology images using metadata discovered in their free-text figure captions. The ARRS GoldMiner image library incorporated 94,256 figures from 11,712 articles published in peer-reviewed online radiology journals. Algorithms were developed to discover metadata--age, sex, and imaging modality--from the figures' free-text captions. Age was recorded in years, and was classified as infant (less than 2 years), child (2 to 17 years), or adult (18+ years). Each figure was assigned to one of eight imaging modalities. A random sample of 1,000 images was examined to measure accuracy of the metadata. The patient's age was identified in 58,994 cases (63%), and the patient's sex was identified in 58,427 cases (62%). An imaging modality was assigned to 80,402 (85%) of the figures. Based on the 1,000 sampled cases, recall values for age, sex, and imaging modality were 97.2%, 99.7%, and 86.4%, respectively. Precision values for age, sex, and imaging modality were 100%, 100%, and 97.2%, respectively. Automated techniques can accurately discover age, sex, and imaging modality metadata from captions of figures published in radiology journals. The metadata can be used to dynamically filter queries for an image search engine.

  15. Forecasting influenza outbreak dynamics in Melbourne from Internet search query surveillance data.

    PubMed

    Moss, Robert; Zarebski, Alexander; Dawson, Peter; McCaw, James M

    2016-07-01

    Accurate forecasting of seasonal influenza epidemics is of great concern to healthcare providers in temperate climates, as these epidemics vary substantially in their size, timing and duration from year to year, making it a challenge to deliver timely and proportionate responses. Previous studies have shown that Bayesian estimation techniques can accurately predict when an influenza epidemic will peak many weeks in advance, using existing surveillance data, but these methods must be tailored both to the target population and to the surveillance system. Our aim was to evaluate whether forecasts of similar accuracy could be obtained for metropolitan Melbourne (Australia). We used the bootstrap particle filter and a mechanistic infection model to generate epidemic forecasts for metropolitan Melbourne (Australia) from weekly Internet search query surveillance data reported by Google Flu Trends for 2006-14. Optimal observation models were selected from hundreds of candidates using a novel approach that treats forecasts akin to receiver operating characteristic (ROC) curves. We show that the timing of the epidemic peak can be accurately predicted 4-6 weeks in advance, but that the magnitude of the epidemic peak and the overall burden are much harder to predict. We then discuss how the infection and observation models and the filtering process may be refined to improve forecast robustness, thereby improving the utility of these methods for healthcare decision support. © 2016 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.

  16. Gene Network Homology in Prokaryotes Using a Similarity Search Approach: Queries of Quorum Sensing Signal Transduction

    PubMed Central

    Quan, David N.; Bentley, William E.

    2012-01-01

    Bacterial cell-cell communication is mediated by small signaling molecules known as autoinducers. Importantly, autoinducer-2 (AI-2) is synthesized via the enzyme LuxS in over 80 species, some of which mediate their pathogenicity by recognizing and transducing this signal in a cell density dependent manner. AI-2 mediated phenotypes are not well understood however, as the means for signal transduction appears varied among species, while AI-2 synthesis processes appear conserved. Approaches to reveal the recognition pathways of AI-2 will shed light on pathogenicity as we believe recognition of the signal is likely as important, if not more, than the signal synthesis. LMNAST (Local Modular Network Alignment Similarity Tool) uses a local similarity search heuristic to study gene order, generating homology hits for the genomic arrangement of a query gene sequence. We develop and apply this tool for the E. coli lac and LuxS regulated (Lsr) systems. Lsr is of great interest as it mediates AI-2 uptake and processing. Both test searches generated results that were subsequently analyzed through a number of different lenses, each with its own level of granularity, from a binary phylogenetic representation down to trackback plots that preserve genomic organizational information. Through a survey of these results, we demonstrate the identification of orthologs, paralogs, hitchhiking genes, gene loss, gene rearrangement within an operon context, and also horizontal gene transfer (HGT). We found a variety of operon structures that are consistent with our hypothesis that the signal can be perceived and transduced by homologous protein complexes, while their regulation may be key to defining subsequent phenotypic behavior. PMID:22916001

  17. Detecting internet activity for erectile dysfunction using search engine query data in the Republic of Ireland.

    PubMed

    Davis, Niall F; Smyth, Lisa G; Flood, Hugh D

    2012-12-01

    What's known on the subject? and What does the study add? Despite the increasing prevalence of erectile dysfunction (ED), there is reluctance among symptomatic patients to present to healthcare providers for appropriate advice and treatment. A number of Internet campaigns have been launched by the Irish healthcare media since 2007 aiming to provide easily accessible advice on ED. Novel online technologies appear to provide a useful tool for educating the general public on the symptoms of ED because there has been a significant increase in overall Internet search activity for this term since 2007. • To assess Internet search trends for erectile dysfunction (ED) subsequent to public awareness campaigns being launched within the Republic of Ireland • To assess whether the advent of such campaigns correlates with increased Internet search activity for ED. • Google insights for search was utilized to examine Internet search trends for the term 'erectile dysfunction' across all categories between January 2005 and December 2011. • Search activity was limited to users from the Republic of Ireland within this timeframe. • Additionally, the number of Irish Internet media campaigns and Irish web pages providing information on ED was assessed between January 2005 and December 2011. • Statistical analysis of the data was performed using analysis of variance and Student's t-tests for pairwise comparisons. • There has been a significant increase in mean search activity for ED on an annual basis since 2007 (P < 0.001). • The number of Irish web pages associated with information on ED has also increased significantly on an annual basis since 2007 (P < 0.001). • There have been seven different Irish Internet media campaigns on ED since 2007 compared to two from 2005 to 2007 (P < 0.001). • There was no significant change in mean search activity for ED from 2005 to 2007 • The advent of recent Internet media campaigns and increasing number of Irish web pages is

  18. Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China.

    PubMed

    Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao

    2017-10-06

    Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Ecological study. Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011-2014. Analyses were conducted at aggregate level and no confidential information was involved. A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. A high correlation between HFMD incidence and BDI ( r =0.794, p<0.001) or temperature ( r =0.657, p<0.001) was observed using both time series plot and correlation matrix. A linear effect of BDI (without lag) and non-linear effect of temperature (1 week lag) on HFMD incidence were found in a distributed lag non-linear model. Compared with the model based on surveillance data only, the ARIMAX model including BDI reached the best goodness-of-fit with an Akaike information criterion (AIC) value of -345.332, whereas the model including both BDI and temperature had the most accurate prediction in terms of the mean absolute percentage error (MAPE) of 101.745%. An ARIMAX model incorporating search engine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of other infectious diseases in other settings. © Article author(s) (or their

  19. Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China

    PubMed Central

    Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao

    2017-01-01

    Objectives Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Design Ecological study. Setting and participants Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011–2014. Analyses were conducted at aggregate level and no confidential information was involved. Outcome measures A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. Results A high correlation between HFMD incidence and BDI (r=0.794, p<0.001) or temperature (r=0.657, p<0.001) was observed using both time series plot and correlation matrix. A linear effect of BDI (without lag) and non-linear effect of temperature (1 week lag) on HFMD incidence were found in a distributed lag non-linear model. Compared with the model based on surveillance data only, the ARIMAX model including BDI reached the best goodness-of-fit with an Akaike information criterion (AIC) value of −345.332, whereas the model including both BDI and temperature had the most accurate prediction in terms of the mean absolute percentage error (MAPE) of 101.745%. Conclusions An ARIMAX model incorporating search engine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of

  20. Influence of legislations and news on Indian internet search query patterns of e-cigarettes.

    PubMed

    Thavarajah, Rooban; Mohandoss, Anusa Arunachalam; Ranganathan, Kannan; Kondalsamy-Chennakesavan, Srinivas

    2017-01-01

    There is a paucity of data on the use of electronic nicotine delivery systems (ENDS) in India. In addition, the Indian internet search pattern for ENDS has not been studied. We aimed to address this lacuna. Moreover, the influence of the tobacco legislations and news pieces on such search volume is not known. Given the fact that ENDS could cause oral lesions, these data are pertinent to dentists. Using a time series analysis, we examined the effect of tobacco-related legislations and news pieces on total search volume (TSV) from September 1, 2012, to August 31, 2016. TSV data were seasonally adjusted and analyzed using time series modeling. The TSV clocked during the month of legislations and news pieces were analyzed for their influence on search pattern of ENDS. The overall mean ± standard deviation (range) TSV was 22273.75 ± 6784.01 (12310-40510) during the study with seasonal variations. Individually, the best model for TSV-legislation and news pieces was autoregressive integrated moving average model, and when influence of legislations and news events were combined, it was the Winter's additive model. In the legislation alone model, the pre-event, event and post-event month TSV was not a better indicator of the effect, barring for post-event month of 2 nd legislation, which involved pictorial warnings on packages in the study period. Similarly, a news piece on Pan-India ban on ENDS influenced the model in the news piece model. When combined, no "events" emerged significant. These findings suggest that search for information on ENDS is increasing and that these tobacco control policies and news items, targeting tobacco usage reduction, have only a short-term effect on the rate of searching for information on ENDS.

  1. Influence of legislations and news on Indian internet search query patterns of e-cigarettes

    PubMed Central

    Thavarajah, Rooban; Mohandoss, Anusa Arunachalam; Ranganathan, Kannan; Kondalsamy-Chennakesavan, Srinivas

    2017-01-01

    Background: There is a paucity of data on the use of electronic nicotine delivery systems (ENDS) in India. In addition, the Indian internet search pattern for ENDS has not been studied. We aimed to address this lacuna. Moreover, the influence of the tobacco legislations and news pieces on such search volume is not known. Given the fact that ENDS could cause oral lesions, these data are pertinent to dentists. Methods: Using a time series analysis, we examined the effect of tobacco-related legislations and news pieces on total search volume (TSV) from September 1, 2012, to August 31, 2016. TSV data were seasonally adjusted and analyzed using time series modeling. The TSV clocked during the month of legislations and news pieces were analyzed for their influence on search pattern of ENDS. Results: The overall mean ± standard deviation (range) TSV was 22273.75 ± 6784.01 (12310–40510) during the study with seasonal variations. Individually, the best model for TSV-legislation and news pieces was autoregressive integrated moving average model, and when influence of legislations and news events were combined, it was the Winter's additive model. In the legislation alone model, the pre-event, event and post-event month TSV was not a better indicator of the effect, barring for post-event month of 2nd legislation, which involved pictorial warnings on packages in the study period. Similarly, a news piece on Pan-India ban on ENDS influenced the model in the news piece model. When combined, no “events” emerged significant. Conclusions: These findings suggest that search for information on ENDS is increasing and that these tobacco control policies and news items, targeting tobacco usage reduction, have only a short-term effect on the rate of searching for information on ENDS. PMID:28932027

  2. Using internet search queries for infectious disease surveillance: screening diseases for suitability.

    PubMed

    Milinovich, Gabriel J; Avril, Simon M R; Clements, Archie C A; Brownstein, John S; Tong, Shilu; Hu, Wenbiao

    2014-12-31

    Internet-based surveillance systems provide a novel approach to monitoring infectious diseases. Surveillance systems built on internet data are economically, logistically and epidemiologically appealing and have shown significant promise. The potential for these systems has increased with increased internet availability and shifts in health-related information seeking behaviour. This approach to monitoring infectious diseases has, however, only been applied to single or small groups of select diseases. This study aims to systematically investigate the potential for developing surveillance and early warning systems using internet search data, for a wide range of infectious diseases. Official notifications for 64 infectious diseases in Australia were downloaded and correlated with frequencies for 164 internet search terms for the period 2009-13 using Spearman's rank correlations. Time series cross correlations were performed to assess the potential for search terms to be used in construction of early warning systems. Notifications for 17 infectious diseases (26.6%) were found to be significantly correlated with a selected search term. The use of internet metrics as a means of surveillance has not previously been described for 12 (70.6%) of these diseases. The majority of diseases identified were vaccine-preventable, vector-borne or sexually transmissible; cross correlations, however, indicated that vector-borne and vaccine preventable diseases are best suited for development of early warning systems. The findings of this study suggest that internet-based surveillance systems have broader applicability to monitoring infectious diseases than has previously been recognised. Furthermore, internet-based surveillance systems have a potential role in forecasting emerging infectious disease events, especially for vaccine-preventable and vector-borne diseases.

  3. A Systematic Assessment of Google Search Queries and Readability of Online Gynecologic Oncology Patient Education Materials.

    PubMed

    Martin, Alexandra; Stewart, J Ryan; Gaskins, Jeremy; Medlin, Erin

    2018-01-20

    The Internet is a major source of health information for gynecologic cancer patients. In this study, we systematically explore common Google search terms related to gynecologic cancer and calculate readability of top resulting websites. We used Google AdWords Keyword Planner to generate a list of commonly searched keywords related to gynecologic oncology, which were sorted into five groups (cervical cancer, ovarian cancer, uterine cancer, vulvar cancer, vaginal cancer) using five patient education websites from sgo.org . Each keyword was Google searched to create a list of top websites. The Python programming language (version 3.5.1) was used to describe frequencies of keywords, top-level domains (TLDs), domains, and readability of top websites using four validated formulae. Of the estimated 1,846,950 monthly searches resulting in 62,227 websites, the most common was cancer.org . The most common TLD was *.com. Most websites were above the eighth-grade reading level recommended by the American Medical Association (AMA) and the National Institute of Health (NIH). The SMOG Index was the most reliable formula. The mean grade level readability for all sites using SMOG was 9.4 ± 2.3, with 23.9% of sites falling at or below the eighth-grade reading level. The first ten results for each Google keyword were easiest to read with results beyond the first page of Google being consistently more difficult. Keywords related to gynecologic malignancies are Google-searched frequently. Most websites are difficult to read without a high school education. This knowledge may help gynecologic oncology providers adequately meet the needs of their patients.

  4. A Typed Text Retrieval Query Language for XML Documents.

    ERIC Educational Resources Information Center

    Colazzo, Dario; Sartiani, Carlo; Albano, Antonio; Manghi, Paolo; Ghelli, Giorgio; Lini, Luca; Paoli, Michele

    2002-01-01

    Discussion of XML focuses on a description of Tequyla-TX, a typed text retrieval query language for XML documents that can search on both content and structures. Highlights include motivations; numerous examples; word-based and char-based searches; tag-dependent full-text searches; text normalization; query algebra; data models and term language;…

  5. Assessing Ebola-related web search behaviour: insights and implications from an analytical study of Google Trends-based query volumes.

    PubMed

    Alicino, Cristiano; Bragazzi, Nicola Luigi; Faccio, Valeria; Amicizia, Daniela; Panatto, Donatella; Gasparini, Roberto; Icardi, Giancarlo; Orsi, Andrea

    2015-12-10

    The 2014 Ebola epidemic in West Africa has attracted public interest worldwide, leading to millions of Ebola-related Internet searches being performed during the period of the epidemic. This study aimed to evaluate and interpret Google search queries for terms related to the Ebola outbreak both at the global level and in all countries where primary cases of Ebola occurred. The study also endeavoured to look at the correlation between the number of overall and weekly web searches and the number of overall and weekly new cases of Ebola. Google Trends (GT) was used to explore Internet activity related to Ebola. The study period was from 29 December 2013 to 14 June 2015. Pearson's correlation was performed to correlate Ebola-related relative search volumes (RSVs) with the number of weekly and overall Ebola cases. Multivariate regression was performed using Ebola-related RSV as a dependent variable, and the overall number of Ebola cases and the Human Development Index were used as predictor variables. The greatest RSV was registered in the three West African countries mainly affected by the Ebola epidemic. The queries varied in the different countries. Both quantitative and qualitative differences between the affected African countries and other Western countries with primary cases were noted, in relation to the different flux volumes and different time courses. In the affected African countries, web query search volumes were mostly concentrated in the capital areas. However, in Western countries, web queries were uniformly distributed over the national territory. In terms of the three countries mainly affected by the Ebola epidemic, the correlation between the number of new weekly cases of Ebola and the weekly GT index varied from weak to moderate. The correlation between the number of Ebola cases registered in all countries during the study period and the GT index was very high. Google Trends showed a coarse-grained nature, strongly correlating with global

  6. Finding Query Suggestions for PubMed

    PubMed Central

    Lu, Zhiyong; Wilbur, W. John; McEntyre, Johanna R; Iskhakov, Alexey; Szilagyi, Lee

    2009-01-01

    It is common for PubMed users to repeatedly modify their queries (search terms) before retrieving documents relevant to their information needs. To assist users in reformulating their queries, we report the implementation and usage analysis of a new component in PubMed called Related Queries, which automatically produces query suggestions in response to the original user’s input. The proposed method is based on query log analysis and focuses on finding popular queries that contain the initial user search term with a goal of helping users describe their information needs in a more precise manner. This work has been integrated into PubMed since January 2009. Automatic assessment using clickthrough data show that each day, the new feature is used consistently between 6% and 10% of the time when it is shown, suggesting that it has quickly become a popular new feature in PubMed. PMID:20351887

  7. Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms.

    PubMed

    Kim, Sun; Yeganova, Lana; Wilbur, W John

    2016-10-01

    Medical Subject Headings (MeSH(®)) is a controlled vocabulary for indexing and searching biomedical literature. MeSH terms and subheadings are organized in a hierarchical structure and are used to indicate the topics of an article. Biologists can use either MeSH terms as queries or the MeSH interface provided in PubMed(®) for searching PubMed abstracts. However, these are rarely used, and there is no convenient way to link standardized MeSH terms to user queries. Here, we introduce a web interface which allows users to enter queries to find MeSH terms closely related to the queries. Our method relies on co-occurrence of text words and MeSH terms to find keywords that are related to each MeSH term. A query is then matched with the keywords for MeSH terms, and candidate MeSH terms are ranked based on their relatedness to the query. The experimental results show that our method achieves the best performance among several term extraction approaches in terms of topic coherence. Moreover, the interface can be effectively used to find full names of abbreviations and to disambiguate user queries. https://www.ncbi.nlm.nih.gov/IRET/MESHABLE/ CONTACT: sun.kim@nih.gov Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  8. Flexible Phrase Based Query Handling Algorithms.

    ERIC Educational Resources Information Center

    Wilbur, W. John; Kim, Won

    2001-01-01

    Flexibility in query handling can be important if one types a search engine query that is misspelled, contains terms not in the database, or requires knowledge of a controlled vocabulary. Presents results of experiments that suggest the optimal form of similarity functions that are applicable to the task of phrase based retrieval to find either…

  9. The Selection of Good Search Terms.

    ERIC Educational Resources Information Center

    van Rijsbergen, C. J.; And Others

    1981-01-01

    Describes the use of relevance feedback to select additional search terms and discusses the extraction of these terms from a maximum spanning tree connecting all terms in the index term vocabulary; retrieval effectiveness for different spanning trees is shown to be similar. Eight references are included. (Author/BK)

  10. They're heating up: Internet search query trends reveal significant public interest in heat-not-burn tobacco products.

    PubMed

    Caputi, Theodore L; Leas, Eric; Dredze, Mark; Cohen, Joanna E; Ayers, John W

    2017-01-01

    Heat-not-burn tobacco products, battery powered devices that heat leaf tobacco to approximately 500 degrees Fahrenheit to produce an inhalable aerosol, are being introduced in markets around the world. Japan, where manufacturers have marketed several heat-not-burn brands since 2014, has been the focal national test market, with the intention of developing global marketing strategies. We used Google search query data to estimate, for the first time, the scale and growth potential of heat-not-burn tobacco products. Average monthly searches for heat-not-burn products rose 1,426% (95%CI: 746,3574) between their first (2015) and second (2016) complete years on the market and an additional 100% (95%CI: 60, 173) between the products second (2016) and third years on the market (Jan-Sep 2017). There are now between 5.9 and 7.5 million heat-not-burn related Google searches in Japan each month based on September 2017 estimates. Moreover, forecasts relying on the historical trends suggest heat-not-burn searches will increase an additional 32% (95%CI: -4 to 79) during 2018, compared to current estimates for 2017 (Jan-Sep), with continued growth thereafter expected. Contrasting heat-not-burn's rise in Japan to electronic cigarettes' rise in the United States we find searches for heat-not-burn eclipsed electronic cigarette searches during April 2016. Moreover, the change in average monthly queries for heat-not-burn in Japan between 2015 and 2017 was 399 (95% CI: 184, 1490) times larger than the change in average monthly queries for electronic cigarettes in the Unites States over the same time period, increasing by 2,956% (95% CI: 1729, 7304) compared to only 7% (95% CI: 3,13). Our findings are a clarion call for tobacco control leaders to ready themselves as heat-not-burn tobacco products will likely garner substantial interest as they are introduced into new markets. Public health practitioners should expand heat-not-burn tobacco product surveillance, adjust existing tobacco

  11. They’re heating up: Internet search query trends reveal significant public interest in heat-not-burn tobacco products

    PubMed Central

    Caputi, Theodore L.; Leas, Eric; Dredze, Mark; Cohen, Joanna E.; Ayers, John W.

    2017-01-01

    Heat-not-burn tobacco products, battery powered devices that heat leaf tobacco to approximately 500 degrees Fahrenheit to produce an inhalable aerosol, are being introduced in markets around the world. Japan, where manufacturers have marketed several heat-not-burn brands since 2014, has been the focal national test market, with the intention of developing global marketing strategies. We used Google search query data to estimate, for the first time, the scale and growth potential of heat-not-burn tobacco products. Average monthly searches for heat-not-burn products rose 1,426% (95%CI: 746,3574) between their first (2015) and second (2016) complete years on the market and an additional 100% (95%CI: 60, 173) between the products second (2016) and third years on the market (Jan-Sep 2017). There are now between 5.9 and 7.5 million heat-not-burn related Google searches in Japan each month based on September 2017 estimates. Moreover, forecasts relying on the historical trends suggest heat-not-burn searches will increase an additional 32% (95%CI: -4 to 79) during 2018, compared to current estimates for 2017 (Jan-Sep), with continued growth thereafter expected. Contrasting heat-not-burn’s rise in Japan to electronic cigarettes’ rise in the United States we find searches for heat-not-burn eclipsed electronic cigarette searches during April 2016. Moreover, the change in average monthly queries for heat-not-burn in Japan between 2015 and 2017 was 399 (95% CI: 184, 1490) times larger than the change in average monthly queries for electronic cigarettes in the Unites States over the same time period, increasing by 2,956% (95% CI: 1729, 7304) compared to only 7% (95% CI: 3,13). Our findings are a clarion call for tobacco control leaders to ready themselves as heat-not-burn tobacco products will likely garner substantial interest as they are introduced into new markets. Public health practitioners should expand heat-not-burn tobacco product surveillance, adjust existing tobacco

  12. SymDex: increasing the efficiency of chemical fingerprint similarity searches for comparing large chemical libraries by using query set indexing.

    PubMed

    Tai, David; Fang, Jianwen

    2012-08-27

    The large sizes of today's chemical databases require efficient algorithms to perform similarity searches. It can be very time consuming to compare two large chemical databases. This paper seeks to build upon existing research efforts by describing a novel strategy for accelerating existing search algorithms for comparing large chemical collections. The quest for efficiency has focused on developing better indexing algorithms by creating heuristics for searching individual chemical against a chemical library by detecting and eliminating needless similarity calculations. For comparing two chemical collections, these algorithms simply execute searches for each chemical in the query set sequentially. The strategy presented in this paper achieves a speedup upon these algorithms by indexing the set of all query chemicals so redundant calculations that arise in the case of sequential searches are eliminated. We implement this novel algorithm by developing a similarity search program called Symmetric inDexing or SymDex. SymDex shows over a 232% maximum speedup compared to the state-of-the-art single query search algorithm over real data for various fingerprint lengths. Considerable speedup is even seen for batch searches where query set sizes are relatively small compared to typical database sizes. To the best of our knowledge, SymDex is the first search algorithm designed specifically for comparing chemical libraries. It can be adapted to most, if not all, existing indexing algorithms and shows potential for accelerating future similarity search algorithms for comparing chemical databases.

  13. Frequent Itemset Mining for Query Expansion in Microblog Ad-hoc Search

    DTIC Science & Technology

    2012-11-01

    geneity of variance was verified by the Levene test . 3http://www.cs.waikato.ac.nz/ml/weka/ Topic Expanded Query MB051 @britishexpat, government...of k1 from 0 to 100, and tested the performance of Okapi BM25 on the TREC 2011 Microblog track qrels. There are 1http://lucene.apache.org/ Report...categoriza- tions in [7]. One-way ANOVA tests of the three catego- rizations don’t show a significant variance of performance across different

  14. Beyond Text Queries and Ranked Lists: Faceted Search in Library Catalogs

    ERIC Educational Resources Information Center

    Niu, Xi

    2012-01-01

    Since the adoption of faceted search in a small number of academic libraries in 2006, faceted library catalogs have gained popularity in many academic and public libraries. This dissertation seeks to understand whether faceted search improves the interactions between searchers and library catalogs and to understand ways that facets are used in…

  15. Tracking search engine queries for suicide in the United Kingdom, 2004-2013.

    PubMed

    Arora, V S; Stuckler, D; McKee, M

    2016-08-01

    First, to determine if a cyclical trend is observed for search activity of suicide and three common suicide risk factors in the United Kingdom: depression, unemployment, and marital strain. Second, to test the validity of suicide search data as a potential marker of suicide risk by evaluating whether web searches for suicide associate with suicide rates among those of different ages and genders in the United Kingdom. Cross-sectional. Search engine data was obtained from Google Trends, a publicly available repository of information of trends and patterns of user searches on Google. The following phrases were entered into Google Trends to analyse relative search volume for suicide, depression, job loss, and divorce, respectively: 'suicide'; 'depression + depressed + hopeless'; 'unemployed + lost job'; 'divorce'. Spearman's rank correlation coefficient was employed to test bivariate associations between suicide search activity and official suicide rates from the Office of National Statistics (ONS). Cyclical trends were observed in search activity for suicide and depression-related search activity, with peaks in autumn and winter months, and a trough in summer months. A positive, non-significant association was found between suicide-related search activity and suicide rates in the general working-age population (15-64 years) (ρ = 0.164; P = 0.652). This association is stronger in younger age groups, particularly for those 25-34 years of age (ρ = 0.848; P = 0.002). We give credence to a link between search activity for suicide and suicide rates in the United Kingdom from 2004 to 2013 for high risk sub-populations (i.e. male youth and young professionals). There remains a need for further research on how Google Trends can be used in other areas of disease surveillance and for work to provide greater geographical precision, as well as research on ways of mitigating the risk of internet use leading to suicide ideation in youth. Copyright © 2015 The Royal

  16. Entity-based Stochastic Analysis of Search Results for Query Expansion and Results Re-Ranking

    DTIC Science & Technology

    2015-11-20

    based on named - entity recognition applied in a set of search re- sults, and on a graph of documents and identified entities that is constructed...the top-L (e.g. L = 1, 000) results are retrieved. Then, Named Entity Recognition (NER) is applied in these results for identifying LOD entities . In...based on named entity recognition applied in a set of re- trieved documents, and on a graph of documents and entities that is constructed dynamically

  17. Semantic Features for Classifying Referring Search Terms

    SciTech Connect

    May, Chandler J.; Henry, Michael J.; McGrath, Liam R.

    2012-05-11

    When an internet user clicks on a result in a search engine, a request is submitted to the destination web server that includes a referrer field containing the search terms given by the user. Using this information, website owners can analyze the search terms leading to their websites to better understand their visitors needs. This work explores some of the features that can be used for classification-based analysis of such referring search terms. We present initial results for the example task of classifying HTTP requests countries of origin. A system that can accurately predict the country of origin from querymore » text may be a valuable complement to IP lookup methods which are susceptible to the obfuscation of dereferrers or proxies. We suggest that the addition of semantic features improves classifier performance in this example application. We begin by looking at related work and presenting our approach. After describing initial experiments and results, we discuss paths forward for this work.« less

  18. A Task-oriented Study on the Influencing Effects of Query-biased Summarization in Web Searching.

    ERIC Educational Resources Information Center

    White, Ryen W.; Jose, Joemon M.; Ruthven, Ian

    2003-01-01

    A task-oriented, comparative evaluation between four Web retrieval systems was performed; two using query-biased summarization, and two using the standard ranked titles/abstracts approach. Results indicate that query-biased summarization techniques appear to be more useful and effective in helping users gauge document relevance than the…

  19. Smart query answering for marine sensor data.

    PubMed

    Shahriar, Md Sumon; de Souza, Paulo; Timms, Greg

    2011-01-01

    We review existing query answering systems for sensor data. We then propose an extended query answering approach termed smart query, specifically for marine sensor data. The smart query answering system integrates pattern queries and continuous queries. The proposed smart query system considers both streaming data and historical data from marine sensor networks. The smart query also uses query relaxation technique and semantics from domain knowledge as a recommender system. The proposed smart query benefits in building data and information systems for marine sensor networks.

  20. Querying and Ranking XML Documents.

    ERIC Educational Resources Information Center

    Schlieder, Torsten; Meuss, Holger

    2002-01-01

    Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…

  1. Querying Proofs

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2012-01-01

    We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.

  2. Searching PubMed for studies on bacteremia, bloodstream infection, septicemia, or whatever the best term is: a note of caution.

    PubMed

    Søgaard, Mette; Andersen, Jens P; Schønheyder, Henrik C

    2012-04-01

    There is inconsistency in the terminology used to describe bacteremia. To demonstrate the impact on information retrieval, we compared the yield of articles from PubMed MEDLINE using the terms "bacteremia," "bloodstream infection," and "septicemia." We searched for articles published between 1966 and 2009, and depicted the relationships among queries graphically. To examine the content of the retrieved articles, we extracted all Medical Subject Headings (MeSH) terms and compared topic similarity using a cosine measure. The recovered articles differed greatly by term, and only 53 articles were captured by all terms. Of the articles retrieved by the "bacteremia" query, 21,438 (84.1%) were not captured when searching for "bloodstream infection" or "septicemia." Likewise, only 2,243 of the 11,796 articles recovered by free-text query for "bloodstream infection" were retrieved by the "bacteremia" query (19%). Entering "bloodstream infection" as a phrase, 46.1% of the records overlapped with the "bacteremia" query. Similarity measures ranged from 0.52 to 0.78 and were lowest for "bloodstream infection" as a phrase compared with "septicemia." Inconsistent terminology has a major impact on the yield of queries. Agreement on terminology should be sought and promoted by scientific journals. An immediate solution is to add "bloodstream infection" as entry term for bacteremia in the MeSH vocabulary. Copyright © 2012 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Mosby, Inc. All rights reserved.

  3. Evidential significance of automotive paint trace evidence using a pattern recognition based infrared library search engine for the Paint Data Query Forensic Database.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Fasasi, Ayuba; Weakley, Andrew

    2016-10-01

    A prototype library search engine has been further developed to search the infrared spectral libraries of the paint data query database to identify the line and model of a vehicle from the clear coat, surfacer-primer, and e-coat layers of an intact paint chip. For this study, search prefilters were developed from 1181 automotive paint systems spanning 3 manufacturers: General Motors, Chrysler, and Ford. The best match between each unknown and the spectra in the hit list generated by the search prefilters was identified using a cross-correlation library search algorithm that performed both a forward and backward search. In the forward search, spectra were divided into intervals and further subdivided into windows (which corresponds to the time lag for the comparison) within those intervals. The top five hits identified in each search window were compiled; a histogram was computed that summarized the frequency of occurrence for each library sample, with the IR spectra most similar to the unknown flagged. The backward search computed the frequency and occurrence of each line and model without regard to the identity of the individual spectra. Only those lines and models with a frequency of occurrence greater than or equal to 20% were included in the final hit list. If there was agreement between the forward and backward search results, the specific line and model common to both hit lists was always the correct assignment. Samples assigned to the same line and model by both searches are always well represented in the library and correlate well on an individual basis to specific library samples. For these samples, one can have confidence in the accuracy of the match. This was not the case for the results obtained using commercial library search algorithms, as the hit quality index scores for the top twenty hits were always greater than 99%. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface

    PubMed Central

    Tate, A Rosemary; Beloff, Natalia; Al-Radwan, Balques; Wickson, Joss; Puri, Shivani; Williams, Timothy; Van Staa, Tjeerd; Bleach, Adrian

    2014-01-01

    Objective UK primary care databases, which contain diagnostic, demographic and prescribing information for millions of patients geographically representative of the UK, represent a significant resource for health services and clinical research. They can be used to identify patients with a specified disease or condition (phenotyping) and to investigate patterns of diagnosis and symptoms. Currently, extracting such information manually is time-consuming and requires considerable expertise. In order to exploit more fully the potential of these large and complex databases, our interdisciplinary team developed generic methods allowing access to different types of user. Materials and methods Using the Clinical Practice Research Datalink database, we have developed an online user-focused system (TrialViz), which enables users interactively to select suitable medical general practices based on two criteria: suitability of the patient base for the intended study (phenotyping) and measures of data quality. Results An end-to-end system, underpinned by an innovative search algorithm, allows the user to extract information in near real-time via an intuitive query interface and to explore this information using interactive visualization tools. A usability evaluation of this system produced positive results. Discussion We present the challenges and results in the development of TrialViz and our plans for its extension for wider applications of clinical research. Conclusions Our fast search algorithms and simple query algorithms represent a significant advance for users of clinical research databases. PMID:24272162

  5. Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface.

    PubMed

    Tate, A Rosemary; Beloff, Natalia; Al-Radwan, Balques; Wickson, Joss; Puri, Shivani; Williams, Timothy; Van Staa, Tjeerd; Bleach, Adrian

    2014-01-01

    UK primary care databases, which contain diagnostic, demographic and prescribing information for millions of patients geographically representative of the UK, represent a significant resource for health services and clinical research. They can be used to identify patients with a specified disease or condition (phenotyping) and to investigate patterns of diagnosis and symptoms. Currently, extracting such information manually is time-consuming and requires considerable expertise. In order to exploit more fully the potential of these large and complex databases, our interdisciplinary team developed generic methods allowing access to different types of user. Using the Clinical Practice Research Datalink database, we have developed an online user-focused system (TrialViz), which enables users interactively to select suitable medical general practices based on two criteria: suitability of the patient base for the intended study (phenotyping) and measures of data quality. An end-to-end system, underpinned by an innovative search algorithm, allows the user to extract information in near real-time via an intuitive query interface and to explore this information using interactive visualization tools. A usability evaluation of this system produced positive results. We present the challenges and results in the development of TrialViz and our plans for its extension for wider applications of clinical research. Our fast search algorithms and simple query algorithms represent a significant advance for users of clinical research databases.

  6. Multitasking Web Searching and Implications for Design.

    ERIC Educational Resources Information Center

    Ozmutlu, Seda; Ozmutlu, H. C.; Spink, Amanda

    2003-01-01

    Findings from a study of users' multitasking searches on Web search engines include: multitasking searches are a noticeable user behavior; multitasking search sessions are longer than regular search sessions in terms of queries per session and duration; both Excite and AlltheWeb.com users search for about three topics per multitasking session and…

  7. SciFinder Scholar 2006: an empirical analysis of research topic query processing.

    PubMed

    Wagner, A Ben

    2006-01-01

    Topical search queries in SciFinder Scholar are processed through an extensive set of natural language processing algorithms that greatly enhance the relevance and comprehensiveness of the search results. Little detailed documentation on these algorithms has been published. However, a careful examination of the highlighted hit terms coupled with a comparison of results from small variations in query language reveal much additional, useful information about these algorithms. An understanding of how these algorithms work can lead to better search results and explain many unexpected results, including differing hit counts for singular versus plural query words and phrases.

  8. Pattern Recognition-Assisted Infrared Library Searching of the Paint Data Query Database to Enhance Lead Information from Automotive Paint Trace Evidence.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Weakley, Andrew

    2017-03-01

    Multilayered automotive paint fragments, which are one of the most complex materials encountered in the forensic science laboratory, provide crucial links in criminal investigations and prosecutions. To determine the origin of these paint fragments, forensic automotive paint examiners have turned to the paint data query (PDQ) database, which allows the forensic examiner to compare the layer sequence and color, texture, and composition of the sample to paint systems of the original equipment manufacturer (OEM). However, modern automotive paints have a thin color coat and this layer on a microscopic fragment is often too thin to obtain accurate chemical and topcoat color information. A search engine has been developed for the infrared (IR) spectral libraries of the PDQ database in an effort to improve discrimination capability and permit quantification of discrimination power for OEM automotive paint comparisons. The similarity of IR spectra of the corresponding layers of various records for original finishes in the PDQ database often results in poor discrimination using commercial library search algorithms. A pattern recognition approach employing pre-filters and a cross-correlation library search algorithm that performs both a forward and backward search has been used to significantly improve the discrimination of IR spectra in the PDQ database and thus improve the accuracy of the search. This improvement permits inter-comparison of OEM automotive paint layer systems using the IR spectra alone. Such information can serve to quantify the discrimination power of the original automotive paint encountered in casework and further efforts to succinctly communicate trace evidence to the courts.

  9. Short-term Internet search using makes people rely on search engines when facing unknown issues.

    PubMed

    Wang, Yifan; Wu, Lingdan; Luo, Liang; Zhang, Yifen; Dong, Guangheng

    2017-01-01

    The Internet search engines, which have powerful search/sort functions and ease of use features, have become an indispensable tool for many individuals. The current study is to test whether the short-term Internet search training can make people more dependent on it. Thirty-one subjects out of forty subjects completed the search training study which included a pre-test, a six-day's training of Internet search, and a post-test. During the pre- and post- tests, subjects were asked to search online the answers to 40 unusual questions, remember the answers and recall them in the scanner. Un-learned questions were randomly presented at the recalling stage in order to elicited search impulse. Comparing to the pre-test, subjects in the post-test reported higher impulse to use search engines to answer un-learned questions. Consistently, subjects showed higher brain activations in dorsolateral prefrontal cortex and anterior cingulate cortex in the post-test than in the pre-test. In addition, there were significant positive correlations self-reported search impulse and brain responses in the frontal areas. The results suggest that a simple six-day's Internet search training can make people dependent on the search tools when facing unknown issues. People are easily dependent on the Internet search engines.

  10. Short-term Internet search using makes people rely on search engines when facing unknown issues

    PubMed Central

    Wang, Yifan; Wu, Lingdan; Luo, Liang; Zhang, Yifen

    2017-01-01

    The Internet search engines, which have powerful search/sort functions and ease of use features, have become an indispensable tool for many individuals. The current study is to test whether the short-term Internet search training can make people more dependent on it. Thirty-one subjects out of forty subjects completed the search training study which included a pre-test, a six-day’s training of Internet search, and a post-test. During the pre- and post- tests, subjects were asked to search online the answers to 40 unusual questions, remember the answers and recall them in the scanner. Un-learned questions were randomly presented at the recalling stage in order to elicited search impulse. Comparing to the pre-test, subjects in the post-test reported higher impulse to use search engines to answer un-learned questions. Consistently, subjects showed higher brain activations in dorsolateral prefrontal cortex and anterior cingulate cortex in the post-test than in the pre-test. In addition, there were significant positive correlations self-reported search impulse and brain responses in the frontal areas. The results suggest that a simple six-day’s Internet search training can make people dependent on the search tools when facing unknown issues. People are easily dependent on the Internet search engines. PMID:28441408

  11. Does query expansion limit our learning? A comparison of social-based expansion to content-based expansion for medical queries on the internet.

    PubMed

    Pentoney, Christopher; Harwell, Jeff; Leroy, Gondy

    2014-01-01

    Searching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google's query suggestion tool, a type of query expansion, aims to facilitate query formation. However, it is unknown how this expansion, which is based on what others searched for, affects the information gathering of the online community. To measure the impact of social-based query expansion, this study compared it with content-based expansion, i.e., what is really in the text. We used 138,906 medical queries from the AOL User Session Collection and expanded them using Google's Autocomplete method (social-based) and the content of the Google Web Corpus (content-based). We evaluated the specificity and ambiguity of the expansion terms for trigram queries. We also looked at the impact on the actual results using domain diversity and expansion edit distance. Results showed that the social-based method provided more precise expansion terms as well as terms that were less ambiguous. Expanded queries do not differ significantly in diversity when expanded using the social-based method (6.72 different domains returned in the first ten results, on average) vs. content-based method (6.73 different domains, on average).

  12. Enabling Incremental Query Re-Optimization.

    PubMed

    Liu, Mengmeng; Ives, Zachary G; Loo, Boon Thau

    2016-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs , and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries ; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations.

  13. Enabling Incremental Query Re-Optimization

    PubMed Central

    Liu, Mengmeng; Ives, Zachary G.; Loo, Boon Thau

    2017-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs, and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations. PMID:28659658

  14. Short-term perceptual learning in visual conjunction search.

    PubMed

    Su, Yuling; Lai, Yunpeng; Huang, Wanyi; Tan, Wei; Qu, Zhe; Ding, Yulong

    2014-08-01

    Although some studies showed that training can improve the ability of cross-dimension conjunction search, less is known about the underlying mechanism. Specifically, it remains unclear whether training of visual conjunction search can successfully bind different features of separated dimensions into a new function unit at early stages of visual processing. In the present study, we utilized stimulus specificity and generalization to provide a new approach to investigate the mechanisms underlying perceptual learning (PL) in visual conjunction search. Five experiments consistently showed that after 40 to 50 min of training of color-shape/orientation conjunction search, the ability to search for a certain conjunction target improved significantly and the learning effects did not transfer to a new target that differed from the trained target in both color and shape/orientation features. However, the learning effects were not strictly specific. In color-shape conjunction search, although the learning effect could not transfer to a same-shape different-color target, it almost completely transferred to a same-color different-shape target. In color-orientation conjunction search, the learning effect partly transferred to a new target that shared same color or same orientation with the trained target. Moreover, the sum of transfer effects for the same color target and the same orientation target in color-orientation conjunction search was algebraically equivalent to the learning effect for trained target, showing an additive transfer effect. The different transfer patterns in color-shape and color-orientation conjunction search learning might reflect the different complexity and discriminability between feature dimensions. These results suggested a feature-based attention enhancement mechanism rather than a unitization mechanism underlying the short-term PL of color-shape/orientation conjunction search.

  15. BLAST++: BLASTing queries in batches.

    PubMed

    Wang, Hao; Ooi, Beng Chin; Tan, Kian-Lee; Ong, Twee-Hee; Zhou, Lei

    2003-11-22

    BLAST++ is a tool that is integrated with NCBI BLAST, allowing multiple, say K, queries to be searched against a database concurrently. The results obtained by BLAST++ are identical to that obtained by executing BLAST on each of the K queries, but BLAST++ completes the processing in a much shorter time. http://xena1.ddns.comp.nus.edu.sg/~genesis/blast++ http://xena1.ddns.comp.nus.edu.sg/~genesis/blast++

  16. An approach to semantic query expansion system based on Hepatitis ontology.

    PubMed

    Yunzhi, Chen; Huijuan, Lu; Shapiro, Linda; Travillian, Ravensara S; Lanjuan, Li

    2016-05-01

    Ontology development, as an increasingly practical vehicle applied in various fields, plays a significant role in knowledge management. This paper, focusing on constructing and querying a hepatitis ontology, aims to provide a framework for ontology-based medical services. The paper is devoted to the algorithm of query expansion for the hepatitis ontology, including synonym expansion, hypernym/hyponym expansion and expansion of similar words. It applies semantic similarity calculation to judge the similarity of retrieval terms. The paper proposes a new prototype system. The accuracy of query expansion is improved in both precision@40 and AP@40, which indicates that query expansion improves the accuracy of the query after using the method proposed in this paper. The paper has adopted semantic similarity computing to improve retrieval performance. Experiments show that search precision of query expansion is higher based on domain concept relationship.

  17. Validation of New Signal Detection Methods for Web Query Log Data Compared to Signal Detection Algorithms Used With FAERS.

    PubMed

    Colilla, Susan; Tov, Elad Yom; Zhang, Ling; Kurzinger, Marie-Laure; Tcherny-Lessenot, Stephanie; Penfornis, Catherine; Jen, Shang; Gonzalez, Danny S; Caubel, Patrick; Welsh, Susan; Juhaeri, Juhaeri

    2017-05-01

    Post-marketing drug surveillance is largely based on signals found in spontaneous reports from patients and healthcare providers. Rare adverse drug reactions and adverse events (AEs) that may develop after long-term exposure to a drug or from drug interactions may be missed. The US FDA and others have proposed that web-based data could be mined as a resource to detect latent signals associated with adverse drug reactions. Recently, a web-based search query method called a query log reaction score (QLRS) was developed to detect whether AEs associated with certain drugs could be found from search engine query data. In this study, we compare the performance of two other algorithms, the proportional query ratio (PQR) and the proportional query rate ratio (Q-PRR) against that of two reference signal-detection algorithms (SDAs) commonly used with the FDA AE Reporting System (FAERS) database. In summary, the web query methods have moderate sensitivity (80%) in detecting signals in web query data compared with reference SDAs in FAERS when the web query data are filtered, but the query metrics generate many false-positives and have low specificity compared with reference SDAs in FAERS. Future research is needed to find better refinements of query data and/or the metrics to improve the specificity of these web query log algorithms.

  18. Thesaurus-Enhanced Search Interfaces.

    ERIC Educational Resources Information Center

    Shiri, Ali Asghar; Revie, Crawford; Chowdhury, Gobinda

    2002-01-01

    Discussion of user interfaces to information retrieval systems focuses on interfaces that incorporate thesauri as part of their searching and browsing facilities. Discusses research literature related to information searching behavior, information retrieval interface evaluation, search term selection, and query expansion; and compares thesaurus…

  19. System, method and apparatus for conducting a phrase search

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W. (Inventor)

    2004-01-01

    A phrase search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more sequences of terms. Next, a relational model of the query is created. The relational model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.

  20. Improving accuracy for identifying related PubMed queries by an integrated approach.

    PubMed

    Lu, Zhiyong; Wilbur, W John

    2009-10-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.

  1. Improving accuracy for identifying related PubMed queries by an integrated approach

    PubMed Central

    Lu, Zhiyong; Wilbur, W. John

    2009-01-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users’ search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1,539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1,396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments. PMID:19162232

  2. A systematic method for search term selection in systematic reviews.

    PubMed

    Thompson, Jenna; Davis, Jacqueline; Mazerolle, Lorraine

    2014-06-01

    The wide variety of readily available electronic media grants anyone the freedom to retrieve published references from almost any area of research around the world. Despite this privilege, keeping up with primary research evidence is almost impossible because of the increase in professional publishing across disciplines. Systematic reviews are a solution to this problem as they aim to synthesize all current information on a particular topic and present a balanced and unbiased summary of the findings. They are fast becoming an important method of research across a number of fields, yet only a small number of guidelines exist on how to define and select terms for a systematic search. This article presents a replicable method for selecting terms in a systematic search using the semantic concept recognition software called leximancer (Leximancer, University of Queensland, Brisbane, Australia). We use this software to construct a set of terms from a corpus of literature pertaining to transborder interventions for drug control and discuss the applicability of this method to systematic reviews in general. This method aims to contribute a more 'systematic' approach for selecting terms in a manner that is entirely replicable for any user. Copyright © 2013 John Wiley & Sons, Ltd.

  3. Improving biomedical information retrieval by linear combinations of different query expansion techniques.

    PubMed

    Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar

    2016-07-25

    Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.

  4. A New Publicly Available Chemical Query Language, CSRML ...

    EPA Pesticide Factsheets

    A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transformation (e.g., SMIRKS, reaction SMILES) queries currently in use. Chemotypes, a term used to represent advanced CSRML queries for repeated application can be encoded not only with connectivity and topology, but also with properties of atoms, bonds, electronic systems, or molecules. The CSRML language has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory and commercial use chemical space, as well as to represent features and frameworks believed to be especially relevant to toxicity concerns. A software application, ChemoTyper, has also been developed and made publicly available to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML standard used in CSRML-based chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge. Paper details specifications for a new XML-based query lan

  5. Hybrid Filtering in Semantic Query Processing

    ERIC Educational Resources Information Center

    Jeong, Hanjo

    2011-01-01

    This dissertation presents a hybrid filtering method and a case-based reasoning framework for enhancing the effectiveness of Web search. Web search may not reflect user needs, intent, context, and preferences, because today's keyword-based search is lacking semantic information to capture the user's context and intent in posing the search query.…

  6. Parallel Index and Query for Large Scale Data Analysis

    SciTech Connect

    Chou, Jerry; Wu, Kesheng; Ruebel, Oliver; Howison, Mark; Qiang, Ji; Prabhat,; Austin, Brian; Bethel, E. Wes; Ryne, Rob D.; Shoshani, Arie

    2011-07-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.

  7. A Preliminary Mapping of Web Queries Using Existing Image Query Schemes.

    ERIC Educational Resources Information Center

    Jansen, Bernard J.

    End user searching on the Web has become the primary method of locating images for many people. This study investigates the nature of Web image queries by attempting to map them to known image classification schemes. In this study, approximately 100,000 image queries from a major Web search engine were collected in 1997, 1999, and 2001. A…

  8. GenoQuery: a new querying module for functional annotation in a genomic warehouse

    PubMed Central

    Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

    2008-01-01

    Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731

  9. FTree query construction for virtual screening: a statistical analysis.

    PubMed

    Gerlach, Christof; Broughton, Howard; Zaliani, Andrea

    2008-02-01

    FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.

  10. Automatic Query Expansion via Lexical-Semantic Relationships.

    ERIC Educational Resources Information Center

    Greenberg, Jane

    2001-01-01

    Reports on an experiment that examined whether thesaurus terms, related to query in a specified semantic way (synonyms, narrower terms, related terms, or broader terms) could be identified as having a more positive impact on retrieval effectiveness when added to a query through automatic query expansion. (Contains 54 references.) (Author/LRW)

  11. Spatial Query for Planetary Data

    NASA Technical Reports Server (NTRS)

    Shams, Khawaja S.; Crockett, Thomas M.; Powell, Mark W.; Joswig, Joseph C.; Fox, Jason M.

    2011-01-01

    Science investigators need to quickly and effectively assess past observations of specific locations on a planetary surface. This innovation involves a location-based search technology that was adapted and applied to planetary science data to support a spatial query capability for mission operations software. High-performance location-based searching requires the use of spatial data structures for database organization. Spatial data structures are designed to organize datasets based on their coordinates in a way that is optimized for location-based retrieval. The particular spatial data structure that was adapted for planetary data search is the R+ tree.

  12. Conceptual mapping of user's queries to medical subject headings.

    PubMed Central

    Zieman, Y. L.; Bleich, H. L.

    1997-01-01

    This paper describes a way to map users' queries to relevant Medical Subject Headings (MeSH terms) used by the National Library of Medicine to index the biomedical literature. The method, called SENSE (SEarch with New SEmantics), transforms words and phrases in the users' queries into primary conceptual components and compares these components with those of the MeSH vocabulary. Similar to the way in which most numbers can be split into numerical factors and expressed as their product--for example, 42 can be expressed as 2*21, 6*7, 3*14, 2*3*7,--so most medical concepts can be split into "semantic factors" and expressed as their juxtaposition. Note that if we split 42 into its primary factors, the breakdown is unique: 2*3*7. Similarly, when we split medical concepts into their "primary semantic factors" the breakdown is also unique. For example, the MeSH term 'renovascular hypertension' can be split morphologically into reno, vascular, hyper, and tension--morphemes that can then be translated into their primary semantic factors--kidney, blood vessel, high, and pressure. By "factoring" each MeSH term in this way, and by similarly factoring the user's query, we can match query to MeSH term by searching for combinations of common factors. Unlike UMLS and other methods that match at the level of words or phrases, SENSE matches at the level of concepts; in this way, a wide variety of words and phrases that have the same meaning produce the same match. Now used in PaperChase, the method is surprisingly powerful in matching users' queries to Medical Subject Headings. PMID:9357680

  13. 28 CFR 25.7 - Querying records in the system.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... search descriptors will be required in all queries of the system for purposes of a background check: (1... may be requested by the system after an initial query include height, weight, eye and hair color, and... in the initial query of the system. ...

  14. Text Retrieval Online: Historical Perspective on Web Search Engines.

    ERIC Educational Resources Information Center

    Hahn, Trudi Bellardo

    1998-01-01

    Provides an overview of online systems and search engines, highlighting search (relationships between terms and interpretation of words), browse, and Web search engine capabilities, iterative searches, canned or stored queries, vocabulary browsing, delivery of full source documents, simple and advanced user interfaces, and global access. Notes…

  15. SPELLING CORRECTION IN THE PUBMED SEARCH ENGINE

    PubMed Central

    Wilbur, W. John; Kim, Won; Xie, Natalie

    2007-01-01

    It is known that users of internet search engines often enter queries with misspellings in one or more search terms. Several web search engines make suggestions for correcting misspelled words, but the methods used are proprietary and unpublished to our knowledge. Here we describe the methodology we have developed to perform spelling correction for the PubMed search engine. Our approach is based on the noisy channel model for spelling correction and makes use of statistics harvested from user logs to estimate the probabilities of different types of edits that lead to misspellings. The unique problems encountered in correcting search engine queries are discussed and our solutions are outlined. PMID:18080004

  16. Optimizing a Query by Transformation and Expansion.

    PubMed

    Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank

    2017-01-01

    In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.

  17. Knowledge-Based Query Construction Using the CDSS Knowledge Base for Efficient Evidence Retrieval.

    PubMed

    Afzal, Muhammad; Hussain, Maqbool; Ali, Taqdir; Hussain, Jamil; Khan, Wajahat Ali; Lee, Sungyoung; Kang, Byeong Ho

    2015-08-28

    Finding appropriate evidence to support clinical practices is always challenging, and the construction of a query to retrieve such evidence is a fundamental step. Typically, evidence is found using manual or semi-automatic methods, which are time-consuming and sometimes make it difficult to construct knowledge-based complex queries. To overcome the difficulty in constructing knowledge-based complex queries, we utilized the knowledge base (KB) of the clinical decision support system (CDSS), which has the potential to provide sufficient contextual information. To automatically construct knowledge-based complex queries, we designed methods to parse rule structure in KB of CDSS in order to determine an executable path and extract the terms by parsing the control structures and logic connectives used in the logic. The automatically constructed knowledge-based complex queries were executed on the PubMed search service to evaluate the results on the reduction of retrieved citations with high relevance. The average number of citations was reduced from 56,249 citations to 330 citations with the knowledge-based query construction approach, and relevance increased from 1 term to 6 terms on average. The ability to automatically retrieve relevant evidence maximizes efficiency for clinicians in terms of time, based on feedback collected from clinicians. This approach is generally useful in evidence-based medicine, especially in ambient assisted living environments where automation is highly important.

  18. Knowledge-Based Query Construction Using the CDSS Knowledge Base for Efficient Evidence Retrieval

    PubMed Central

    Afzal, Muhammad; Hussain, Maqbool; Ali, Taqdir; Hussain, Jamil; Khan, Wajahat Ali; Lee, Sungyoung; Kang, Byeong Ho

    2015-01-01

    Finding appropriate evidence to support clinical practices is always challenging, and the construction of a query to retrieve such evidence is a fundamental step. Typically, evidence is found using manual or semi-automatic methods, which are time-consuming and sometimes make it difficult to construct knowledge-based complex queries. To overcome the difficulty in constructing knowledge-based complex queries, we utilized the knowledge base (KB) of the clinical decision support system (CDSS), which has the potential to provide sufficient contextual information. To automatically construct knowledge-based complex queries, we designed methods to parse rule structure in KB of CDSS in order to determine an executable path and extract the terms by parsing the control structures and logic connectives used in the logic. The automatically constructed knowledge-based complex queries were executed on the PubMed search service to evaluate the results on the reduction of retrieved citations with high relevance. The average number of citations was reduced from 56,249 citations to 330 citations with the knowledge-based query construction approach, and relevance increased from 1 term to 6 terms on average. The ability to automatically retrieve relevant evidence maximizes efficiency for clinicians in terms of time, based on feedback collected from clinicians. This approach is generally useful in evidence-based medicine, especially in ambient assisted living environments where automation is highly important. PMID:26343669

  19. A multilingual image search engine.

    PubMed

    Kahn, Charles E; Thao, Cheng

    2008-11-06

    A multilingual search interface has been created for a large, richly-indexed multi-journal library of medical images. Images are indexed by keywords and medical concepts. Nine non-English languages are supported, including Chinese and Japanese. Queries are translated into Medical Subject Headings (MeSHA(R)) terms through a specialized interface with the U.S. National Library of Medicine. The ARRS GoldMiner(TM) Global search engine presents the query and navigation information in the original language with English-language search results.

  20. An index-based algorithm for fast on-line query processing of latent semantic analysis.

    PubMed

    Zhang, Mingxi; Li, Pohan; Wang, Wei

    2017-01-01

    Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.

  1. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-01

    from variations of SQL to specialized languages such as PIG [3] and HIVE [4]. Often, custom program snippets in programming languages such as Python...commonly used query languages such as Structured Query Language ( SQL ). The declarative approach makes the queries portable, and results in several...made possible through the use of a knowledge registry. In this report, we discuss embedding A-Expressions in the widely used SQL , resolving A

  2. Knowledge Query Language (KQL)

    DTIC Science & Technology

    2016-02-12

    and range from variations of SQL to specialized languages such as PIG [3] and HIVE [4]. Often, custom program snippets in programming languages such...embedded in commonly used query languages such as Structured Query Language ( SQL ). The declarative approach makes the queries portable, and results in...is made possible through the use of a knowledge registry. In this report, we discuss embedding A-Expressions in the widely used SQL , resolving A

  3. Document Retrieval Using a Serial Bit String Search.

    ERIC Educational Resources Information Center

    Harding, Alan F.; And Others

    1983-01-01

    The experimental best match information retrieval system described is based on serial file organization. Documents and queries are characterized by fixed length bit strings (generated by automatic and manual methods) and character-by-character term match is preceeded by bit string search to eliminate documents which cannot satisfy query.…

  4. Spatial information semantic query based on SPARQL

    NASA Astrophysics Data System (ADS)

    Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang

    2009-10-01

    How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.

  5. Secure Skyline Queries on Cloud Platform

    PubMed Central

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-01-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710

  6. Secure Skyline Queries on Cloud Platform.

    PubMed

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-04-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.

  7. Business information query expansion through semantic network

    NASA Astrophysics Data System (ADS)

    Gong, Zhiguo; Muyeba, Maybin; Guo, Jingzhi

    2010-02-01

    In this article, we propose a method for business information query expansions. In our approach, hypernym/hyponymy and synonym relations in WordNet are used as the basic expansion rules. Then we use WordNet Lexical Chains and WordNet semantic similarity to assign terms in the same query into different groups with respect to their semantic similarities. For each group, we expand the highest terms in the WordNet hierarchies with hypernym and synonym, the lowest terms with hyponym and synonym and all other terms with only synonym. In this way, the contradictory caused by full expansion can be well controlled. Furthermore, we use collection-related term semantic network to further improve the expansion performance. And our experiment reveals that our solution for query expansion can improve the query performance dramatically.

  8. Generating Personalized Web Search Using Semantic Context

    PubMed Central

    Xu, Zheng; Chen, Hai-Yan; Yu, Jie

    2015-01-01

    The “one size fits the all” criticism of search engines is that when queries are submitted, the same results are returned to different users. In order to solve this problem, personalized search is proposed, since it can provide different search results based upon the preferences of users. However, existing methods concentrate more on the long-term and independent user profile, and thus reduce the effectiveness of personalized search. In this paper, the method captures the user context to provide accurate preferences of users for effectively personalized search. First, the short-term query context is generated to identify related concepts of the query. Second, the user context is generated based on the click through data of users. Finally, a forgetting factor is introduced to merge the independent user context in a user session, which maintains the evolution of user preferences. Experimental results fully confirm that our approach can successfully represent user context according to individual user information needs. PMID:26000335

  9. Generating personalized web search using semantic context.

    PubMed

    Xu, Zheng; Chen, Hai-Yan; Yu, Jie

    2015-01-01

    The "one size fits the all" criticism of search engines is that when queries are submitted, the same results are returned to different users. In order to solve this problem, personalized search is proposed, since it can provide different search results based upon the preferences of users. However, existing methods concentrate more on the long-term and independent user profile, and thus reduce the effectiveness of personalized search. In this paper, the method captures the user context to provide accurate preferences of users for effectively personalized search. First, the short-term query context is generated to identify related concepts of the query. Second, the user context is generated based on the click through data of users. Finally, a forgetting factor is introduced to merge the independent user context in a user session, which maintains the evolution of user preferences. Experimental results fully confirm that our approach can successfully represent user context according to individual user information needs.

  10. Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters.

    PubMed

    Lokker, Cynthia; Haynes, R Brian; Wilczynski, Nancy L; McKibbon, K Ann; Walter, Stephen D

    2011-01-01

    Clinical Queries filters were developed to improve the retrieval of high-quality studies in searches on clinical matters. The study objective was to determine the yield of relevant citations and physician satisfaction while searching for diagnostic and treatment studies using the Clinical Queries page of PubMed compared with searching PubMed without these filters. Forty practicing physicians, presented with standardized treatment and diagnosis questions and one question of their choosing, entered search terms which were processed in a random, blinded fashion through PubMed alone and PubMed Clinical Queries. Participants rated search retrievals for applicability to the question at hand and satisfaction. For treatment, the primary outcome of retrieval of relevant articles was not significantly different between the groups, but a higher proportion of articles from the Clinical Queries searches met methodologic criteria (p=0.049), and more articles were published in core internal medicine journals (p=0.056). For diagnosis, the filtered results returned more relevant articles (p=0.031) and fewer irrelevant articles (overall retrieval less, p=0.023); participants needed to screen fewer articles before arriving at the first relevant citation (p<0.05). Relevance was also influenced by content terms used by participants in searching. Participants varied greatly in their search performance. Clinical Queries filtered searches returned more high-quality studies, though the retrieval of relevant articles was only statistically different between the groups for diagnosis questions. Retrieving clinically important research studies from Medline is a challenging task for physicians. Methodological search filters can improve search retrieval.

  11. Age-related differences in the accuracy of web query-based predictions of influenza-like illness.

    PubMed

    Domnich, Alexander; Panatto, Donatella; Signori, Alessio; Lai, Piero Luigi; Gasparini, Roberto; Amicizia, Daniela

    2015-01-01

    Web queries are now widely used for modeling, nowcasting and forecasting influenza-like illness (ILI). However, given that ILI attack rates vary significantly across ages, in terms of both magnitude and timing, little is known about whether the association between ILI morbidity and ILI-related queries is comparable across different age-groups. The present study aimed to investigate features of the association between ILI morbidity and ILI-related query volume from the perspective of age. Since Google Flu Trends is unavailable in Italy, Google Trends was used to identify entry terms that correlated highly with official ILI surveillance data. All-age and age-class-specific modeling was performed by means of linear models with generalized least-square estimation. Hold-out validation was used to quantify prediction accuracy. For purposes of comparison, predictions generated by exponential smoothing were computed. Five search terms showed high correlation coefficients of > .6. In comparison with exponential smoothing, the all-age query-based model correctly predicted the peak time and yielded a higher correlation coefficient with observed ILI morbidity (.978 vs. .929). However, query-based prediction of ILI morbidity was associated with a greater error. Age-class-specific query-based models varied significantly in terms of prediction accuracy. In the 0-4 and 25-44-year age-groups, these did well and outperformed exponential smoothing predictions; in the 15-24 and ≥ 65-year age-classes, however, the query-based models were inaccurate and highly overestimated peak height. In all but one age-class, peak timing predicted by the query-based models coincided with observed timing. The accuracy of web query-based models in predicting ILI morbidity rates could differ among ages. Greater age-specific detail may be useful in flu query-based studies in order to account for age-specific features of the epidemiology of ILI.

  12. In Search of Decay in Verbal Short-Term Memory

    ERIC Educational Resources Information Center

    Berman, Marc G.; Jonides, John; Lewis, Richard L.

    2009-01-01

    Is forgetting in the short term due to decay with the mere passage of time, interference from other memoranda, or both? Past research on short-term memory has revealed some evidence for decay and a plethora of evidence showing that short-term memory is worsened by interference. However, none of these studies has directly contrasted decay and…

  13. A Query Suggestion Workflow for Life Science IR-Systems.

    PubMed

    Esch, Maria; Chen, Jinbo; Weise, Stephan; Hassani-Pak, Keywan; Scholz, Uwe; Lange, Matthias

    2014-06-01

    Information Retrieval (IR) plays a central role in the exploration and interpretation of integrated biological datasets that represent the heterogeneous ecosystem of life sciences. Here, keyword based query systems are popular user interfaces. In turn, to a large extend, the used query phrases determine the quality of the search result and the effort a scientist has to invest for query refinement. In this context, computer aided query expansion and suggestion is one of the most challenging tasks for life science information systems. Existing query front-ends support aspects like spelling correction, query refinement or query expansion. However, the majority of the front-ends only make limited use of enhanced IR algorithms to implement comprehensive and computer aided query refinement workflows. In this work, we present the design of a multi-stage query suggestion workflow and its implementation in the life science IR system LAILAPS. The presented workflow includes enhanced tokenisation, word breaking, spelling correction, query expansion and query suggestion ranking. A spelling correction benchmark with 5,401 queries and manually selected use cases for query expansion demonstrate the performance of the implemented workflow and its advantages compared with state-of-the-art systems.

  14. A query suggestion workflow for life science IR-systems.

    PubMed

    Esch, Maria; Chen, Jinbo; Weise, Stephan; Hassani-Pak, Keywan; Scholz, Uwe; Lange, Matthias

    2014-06-13

    Information Retrieval (IR) plays a central role in the exploration and interpretation of integrated biological datasets that represent the heterogeneous ecosystem of life sciences. Here, keyword based query systems are popular user interfaces. In turn, to a large extend, the used query phrases determine the quality of the search result and the effort a scientist has to invest for query refinement. In this context, computer aided query expansion and suggestion is one of the most challenging tasks for life science information systems. Existing query front-ends support aspects like spelling correction, query refinement or query expansion. However, the majority of the front-ends only make limited use of enhanced IR algorithms to implement comprehensive and computer aided query refinement workflows. In this work, we present the design of a multi-stage query suggestion workflow and its implementation in the life science IR system LAILAPS. The presented workflow includes enhanced tokenisation, word breaking, spelling correction, query expansion and query suggestion ranking. A spelling correction benchmark with 5,401 queries and manually selected use cases for query expansion demonstrate the performance of the implemented workflow and its advantages compared with state-of-the-art systems.

  15. Declarative Visualization Queries

    NASA Astrophysics Data System (ADS)

    Pinheiro da Silva, P.; Del Rio, N.; Leptoukh, G. G.

    2011-12-01

    In an ideal interaction with machines, scientists may prefer to write declarative queries saying "what" they want from a machine than to write code stating "how" the machine is going to address the user request. For example, in relational database, users have long relied on specifying queries using Structured Query Language (SQL), a declarative language to request data results from a database management system. In the context of visualizations, we see that users are still writing code based on complex visualization toolkit APIs. With the goal of improving the scientists' experience of using visualization technology, we have applied this query-answering pattern to a visualization setting, where scientists specify what visualizations they want generated using a declarative SQL-like notation. A knowledge enhanced management system ingests the query and knows the following: (1) know how to translate the query into visualization pipelines; and (2) how to execute the visualization pipelines to generate the requested visualization. We define visualization queries as declarative requests for visualizations specified in an SQL like language. Visualization queries specify what category of visualization to generate (e.g., volumes, contours, surfaces) as well as associated display attributes (e.g., color and opacity), without any regards for implementation, thus allowing scientists to remain partially unaware of a wide range of visualization toolkit (e.g., Generic Mapping Tools and Visualization Toolkit) specific implementation details. Implementation details are only a concern for our knowledge-based visualization management system, which uses both the information specified in the query and knowledge about visualization toolkit functions to construct visualization pipelines. Knowledge about the use of visualization toolkits includes what data formats the toolkit operates on, what formats they output, and what views they can generate. Visualization knowledge, which is not

  16. Using Bitmap Indexing Technology for Combined Numerical and TextQueries

    SciTech Connect

    Stockinger, Kurt; Cieslewicz, John; Wu, Kesheng; Rotem, Doron; Shoshani, Arie

    2006-10-16

    In this paper, we describe a strategy of using compressedbitmap indices to speed up queries on both numerical data and textdocuments. By using an efficient compression algorithm, these compressedbitmap indices are compact even for indices with millions of distinctterms. Moreover, bitmap indices can be used very efficiently to answerBoolean queries over text documents involving multiple query terms.Existing inverted indices for text searches are usually inefficient forcorpora with a very large number of terms as well as for queriesinvolving a large number of hits. We demonstrate that our compressedbitmap index technology overcomes both of those short-comings. In aperformance comparison against a commonly used database system, ourindices answer queries 30 times faster on average. To provide full SQLsupport, we integrated our indexing software, called FastBit, withMonetDB. The integrated system MonetDB/FastBit provides not onlyefficient searches on a single table as FastBit does, but also answersjoin queries efficiently. Furthermore, MonetDB/FastBit also provides avery efficient retrieval mechanism of result records.

  17. Machine intelligence for health information: capturing concepts and trends in social media via query expansion.

    PubMed

    Su, Xing Yu; Suominen, Hanna; Hanlen, Leif

    2011-01-01

    We aim to improve retrieval of health information from Twitter. The popularity of social media and micro-blogs has emphasised their potential for knowledge discovery and trend building. However, capturing and relating concepts in these short-spoken and lexically extensive sources of information requires search engines with increasing intelligence. Our approach uses query expansion techniques to associate query terms with the most similar Twitter terms to capture trends in the gamut of information. We demonstrated the value, defined as improved precision, of our search engine by considering three search tasks and two independent annotators. We also showed the stability of the engine with an increasing number of tweets; this is crucial as large data sets are needed for capturing trends with high confidence. These results encourage us to continue developing the engine for discovering trends in health information available at Twitter.

  18. Text mining for search term development in systematic reviewing: A discussion of some methods and challenges.

    PubMed

    Stansfield, Claire; O'Mara-Eves, Alison; Thomas, James

    2017-09-01

    Using text mining to aid the development of database search strings for topics described by diverse terminology has potential benefits for systematic reviews; however, methods and tools for accomplishing this are poorly covered in the research methods literature. We briefly review the literature on applications of text mining for search term development for systematic reviewing. We found that the tools can be used in 5 overarching ways: improving the precision of searches; identifying search terms to improve search sensitivity; aiding the translation of search strategies across databases; searching and screening within an integrated system; and developing objectively derived search strategies. Using a case study and selected examples, we then reflect on the utility of certain technologies (term frequency-inverse document frequency and Termine, term frequency, and clustering) in improving the precision and sensitivity of searches. Challenges in using these tools are discussed. The utility of these tools is influenced by the different capabilities of the tools, the way the tools are used, and the text that is analysed. Increased awareness of how the tools perform facilitates the further development of methods for their use in systematic reviews. Copyright © 2017 John Wiley & Sons, Ltd.

  19. SAM Biotoxin Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target biotoxin analytes in environmental samples can use this online query tool to identify analytical methods included in EPA's Selected Analytical Methods for Environmental Remediation and Recovery for select biotoxins.

  20. SAM Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target chemical, radiochemical, pathogens, and biotoxin analytes in environmental samples can use this online query tool to identify analytical methods included in EPA's Selected Analytical Methods for Environmental Remediation

  1. SAM Chemical Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target chemical, radiochemical, pathogens, and biotoxin analytes in environmental samples can use this online query tool to identify analytical methods in EPA's Selected Analytical Methods for Environmental Remediation and Recovery

  2. SAM Pathogen Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target pathogen analytes in environmental samples can use this online query tool to identify analytical methods in EPA's Selected Analytical Methods for Environmental Remediation and Recovery for select pathogens.

  3. SAM Radiochemical Methods Query

    EPA Pesticide Factsheets

    Laboratories measuring target radiochemical analytes in environmental samples can use this online query tool to identify analytical methods in EPA's Selected Analytical Methods for Environmental Remediation and Recovery for select radiochemical analytes.

  4. Arctic Data Explorer: A Rich Solr Powered Metadata Search Portal

    NASA Astrophysics Data System (ADS)

    Liu, M.; Truslove, I.; Yarmey, L.; Lopez, L.; Reed, S. A.; Brandt, M.

    2013-12-01

    The Advanced Cooperative Arctic Data and Information Service (ACADIS) manages data and is the gateway for all relevant Arctic physical, life, and social science data for the Arctic Sciences (ARC) research community. Arctic Data Explorer (ADE), developed by the National Snow and Ice Data Center (NSIDC) under the ACADIS umbrella, is a data portal that provides users the ability to search across multiple Arctic data catalogs rapidly and precisely. In order to help the users quickly find the data they are interested in, we provided a simple search interface -- a search box with spatial and temporal options. The core of the interface is a ';google-like' single search box with logic to handle complex queries behind the scenes. ACADIS collects all metadata through the GI-Cat metadata broker service and indexes it in Solr. The single search box is implemented as a text based search utilizing the powerful tools provided by Solr. In this poster, we briefly explain Solr's indexing and searching capabilities. Several examples are presented to illustrate the rich search functionality the simple search box supports. Then we dive into the implementation details such as how phrase query, wildcard query, range query, fuzzy query and special query search term handling was integrated into ADE search. To provide our users the most relevant answers to their queries as quickly as possible, we worked with the Advisory Committee and the expanding Arctic User Community (scientists and data experts) to collect feedback to improve the search results and adjust the relevance/ranking logic to return more precise search results. The poster has specific examples on how we tuned the relevance ranking to achieve higher quality search results. A feature in the plan is to provide data sets recommendations based on user's current search history. Both collaborative filtering and content-based approaches were considered and researched. A feasible solution is proposed based on the content-based approach.

  5. Meta Search Engines.

    ERIC Educational Resources Information Center

    Garman, Nancy

    1999-01-01

    Describes common options and features to consider in evaluating which meta search engine will best meet a searcher's needs. Discusses number and names of engines searched; other sources and specialty engines; search queries; other search options; and results options. (AEF)

  6. Long-Term Priming of Visual Search Prevails against the Passage of Time and Counteracting Instructions

    ERIC Educational Resources Information Center

    Kruijne, Wouter; Meeter, Martijn

    2016-01-01

    Studies on "intertrial priming" have shown that in visual search experiments, the preceding trial automatically affects search performance: facilitating it when the target features repeat and giving rise to switch costs when they change--so-called (short-term) intertrial priming. These effects also occur at longer time scales: When 1 of…

  7. Independence of long-term contextual memory and short-term perceptual hypotheses: Evidence from contextual cueing of interrupted search.

    PubMed

    Schlagbauer, Bernhard; Mink, Maurice; Müller, Hermann J; Geyer, Thomas

    2017-02-01

    Observers are able to resume an interrupted search trial faster relative to responding to a new, unseen display. This finding of rapid resumption is attributed to short-term perceptual hypotheses generated on the current look and confirmed upon subsequent looks at the same display. It has been suggested that the contents of perceptual hypotheses are similar to those of other forms of memory acquired long-term through repeated exposure to the same search displays over the course of several trials, that is, the memory supporting "contextual cueing." In three experiments, we investigated the relationship between short-term perceptual hypotheses and long-term contextual memory. The results indicated that long-term, contextual memory of repeated displays neither affected the generation nor the confirmation of short-term perceptual hypotheses for these displays. Furthermore, the analysis of eye movements suggests that long-term memory provides an initial benefit in guiding attention to the target, whereas in subsequent looks guidance is entirely based on short-term perceptual hypotheses. Overall, the results reveal a picture of both long- and short-term memory contributing to reliable performance gains in interrupted search, while exerting their effects in an independent manner.

  8. Multi-field query expansion is effective for biomedical dataset retrieval.

    PubMed

    Bouadjenek, Mohamed Reda; Verspoor, Karin

    2017-01-01

    In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery

  9. Multi-field query expansion is effective for biomedical dataset retrieval

    PubMed Central

    2017-01-01

    Abstract In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data

  10. Environmental Dataset Gateway (EDG) Search Widget

    EPA Pesticide Factsheets

    Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other other applications. This allows individuals to provide direct access to EPA's metadata outside the EDG interface. The EDG Search Widget makes it possible to search the EDG from another web page or application. The search widget can be included on your website by simply inserting one or two lines of code. Users can type a search term or lucene search query in the search field and retrieve a pop-up list of records that match that search.

  11. Keeping Dublin Core Simple: Cross-Domain Discovery or Resource Description?; First Steps in an Information Commerce Economy: Digital Rights Management in the Emerging E-Book Environment; Interoperability: Digital Rights Management and the Emerging EBook Environment; Searching the Deep Web: Direct Query Engine Applications at the Department of Energy.

    ERIC Educational Resources Information Center

    Lagoze, Carl; Neylon, Eamonn; Mooney, Stephen; Warnick, Walter L.; Scott, R. L.; Spence, Karen J.; Johnson, Lorrie A.; Allen, Valerie S.; Lederman, Abe

    2001-01-01

    Includes four articles that discuss Dublin Core metadata, digital rights management and electronic books, including interoperability; and directed query engines, a type of search engine designed to access resources on the deep Web that is being used at the Department of Energy. (LRW)

  12. Querying Safety Cases

    NASA Technical Reports Server (NTRS)

    Denney, Ewen W.; Naylor, Dwight; Pai, Ganesh

    2014-01-01

    Querying a safety case to show how the various stakeholders' concerns about system safety are addressed has been put forth as one of the benefits of argument-based assurance (in a recent study by the Health Foundation, UK, which reviewed the use of safety cases in safety-critical industries). However, neither the literature nor current practice offer much guidance on querying mechanisms appropriate for, or available within, a safety case paradigm. This paper presents a preliminary approach that uses a formal basis for querying safety cases, specifically Goal Structuring Notation (GSN) argument structures. Our approach semantically enriches GSN arguments with domain-specific metadata that the query language leverages, along with its inherent structure, to produce views. We have implemented the approach in our toolset AdvoCATE, and illustrate it by application to a fragment of the safety argument for an Unmanned Aircraft System (UAS) being developed at NASA Ames. We also discuss the potential practical utility of our query mechanism within the context of the existing framework for UAS safety assurance.

  13. Understanding vaccination resistance: vaccine search term selection bias and the valence of retrieved information.

    PubMed

    Ruiz, Jeanette B; Bell, Robert A

    2014-10-07

    Dubious vaccination-related information on the Internet leads some parents to opt out of vaccinating their children. To determine if negative, neutral and positive search terms retrieve vaccination information that differs in valence and confirms searchers' assumptions about vaccination. A content analysis of first-page Google search results was conducted using three negative, three neutral, and three positive search terms for the concepts "vaccine," "vaccination," and "MMR"; 84 of the 90 websites retrieved met inclusion requirements. Two coders independently and reliably coded for the presence or absence of each of 15 myths about vaccination (e.g., "vaccines cause autism"), statements that countered these myths, and recommendations for or against vaccination. Data were analyzed using descriptive statistics. Across all websites, at least one myth was perpetuated on 16.7% of websites and at least one myth was countered on 64.3% of websites. The mean number of myths perpetuated on websites retrieved with negative, neutral, and positive search terms, respectively, was 1.93, 0.53, and 0.40. The mean number of myths countered on websites retrieved with negative, neutral, and positive search terms, respectively, was 3.0, 3.27, and 2.87. Explicit recommendations regarding vaccination were offered on 22.6% of websites. A recommendation against vaccination was more often made on websites retrieved with negative search terms (37.5% of recommendations) than on websites retrieved with neutral (12.5%) or positive (0%) search terms. The concerned parent who seeks information about the risks of childhood immunizations will find more websites that perpetuate vaccine myths and recommend against vaccination than the parent who seeks information about the benefits of vaccination. This suggests that search term valence can lead to online information that supports concerned parents' misconceptions about vaccines. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Code query by example

    NASA Astrophysics Data System (ADS)

    Vaucouleur, Sebastien

    2011-02-01

    We introduce code query by example for customisation of evolvable software products in general and of enterprise resource planning systems (ERPs) in particular. The concept is based on an initial empirical study on practices around ERP systems. We motivate our design choices based on those empirical results, and we show how the proposed solution helps with respect to the infamous upgrade problem: the conflict between the need for customisation and the need for upgrade of ERP systems. We further show how code query by example can be used as a form of lightweight static analysis, to detect automatically potential defects in large software products. Code query by example as a form of lightweight static analysis is particularly interesting in the context of ERP systems: it is often the case that programmers working in this field are not computer science specialists but more of domain experts. Hence, they require a simple language to express custom rules.

  15. Implicit short- and long-term memory direct our gaze in visual search.

    PubMed

    Kruijne, Wouter; Meeter, Martijn

    2016-04-01

    Visual attention is strongly affected by the past: both by recent experience and by long-term regularities in the environment that are encoded in and retrieved from memory. In visual search, intertrial repetition of targets causes speeded response times (short-term priming). Similarly, targets that are presented more often than others may facilitate search, even long after it is no longer present (long-term priming). In this study, we investigate whether such short-term priming and long-term priming depend on dissociable mechanisms. By recording eye movements while participants searched for one of two conjunction targets, we explored at what stages of visual search different forms of priming manifest. We found both long- and short- term priming effects. Long-term priming persisted long after the bias was present, and was again found even in participants who were unaware of a color bias. Short- and long-term priming affected the same stage of the task; both biased eye movements towards targets with the primed color, already starting with the first eye movement. Neither form of priming affected the response phase of a trial, but response repetition did. The results strongly suggest that both long- and short-term memory can implicitly modulate feedforward visual processing.

  16. QueryCat: automatic categorization of MEDLINE queries.

    PubMed Central

    Pratt, W.; Wasserman, H.

    2000-01-01

    A searcher's inability to formulate an appropriate query can result in an overwhelming number of retrieved documents. Our approach to this problem is to use information about common types or categories of queries to (1) reformulate the user's initial query and (2) create an informative organization of the retrieved documents from the reformulated query. To achieve these goals, we first must identify which common categories or types of queries are the best abstraction of the user's specific query. In this paper, we describe a system that performs this first step of categorizing the user's query. Our system uses a two-phased approach: a lexical analysis phase, and a semantic analysis phase. An evaluation of our system demonstrates that its query categorization corresponds reasonably well to the query categorizations by medical librarians and physicians. PMID:11079965

  17. Strategic search from long-term memory: an examination of semantic and autobiographical recall.

    PubMed

    Unsworth, Nash; Brewer, Gene A; Spillers, Gregory J

    2014-01-01

    Searching long-term memory is theoretically driven by both directed (search strategies) and random components. In the current study we conducted four experiments evaluating strategic search in semantic and autobiographical memory. Participants were required to generate either exemplars from the category of animals or the names of their friends for several minutes. Self-reported strategies suggested that participants typically relied on visualization strategies for both tasks and were less likely to rely on ordered strategies (e.g., alphabetic search). When participants were instructed to use particular strategies, the visualization strategy resulted in the highest levels of performance and the most efficient search, whereas ordered strategies resulted in the lowest levels of performance and fairly inefficient search. These results are consistent with the notion that retrieval from long-term memory is driven, in part, by search strategies employed by the individual, and that one particularly efficient strategy is to visualize various situational contexts that one has experienced in the past in order to constrain the search and generate the desired information.

  18. FRS EZ Query

    EPA Pesticide Factsheets

    This page is the starting point for EZ Query. This page describes how to select key data elements from EPA's Facility Information Database and Geospatial Reference Database to build a tabular report or a Comma Separated Value (CSV) files for downloading.

  19. Query Expansion Using SNOMED-CT and Weighing Schemes

    DTIC Science & Technology

    2014-11-01

    For this research, we have used SNOMED-CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. General Terms...CT along with UMLS Methathesaurus as our ontology in medical domain to expand the queries. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17...University of the Basque country discuss their finding on query expansion using external sources headlined by Unified Medical Language System ( UMLS

  20. A study of the influence of task familiarity on user behaviors and performance with a MeSH term suggestion interface for PubMed bibliographic search.

    PubMed

    Tang, Muh-Chyun; Liu, Ying-Hsang; Wu, Wan-Ching

    2013-09-01

    Previous research has shown that information seekers in biomedical domain need more support in formulating their queries. A user study was conducted to evaluate the effectiveness of a metadata based query suggestion interface for PubMed bibliographic search. The study also investigated the impact of search task familiarity on search behaviors and the effectiveness of the interface. A real user, user search request and real system approach was used for the study. Unlike tradition IR evaluation, where assigned tasks were used, the participants were asked to search requests of their own. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. Several performance criteria were measured to assess the potential benefits of the experimental interface, including users' assessment of their original and eventual queries, the perceived usefulness of the interfaces, satisfaction with the search results, and the average relevance score of the saved records. The results show that, when searching for an unfamiliar topic, users were more likely to change their queries, indicating the effect of familiarity on search behaviors. The results also show that the interface scored higher on several of the performance criteria, such as the "goodness" of the queries, perceived usefulness, and user satisfaction. Furthermore, in line with our hypothesis, the proposed interface was relatively more effective when less familiar search requests were attempted. Results indicate that there is a selective compatibility between search familiarity and search interface. One implication of the research for system evaluation is the importance of taking into consideration task familiarity when assessing the effectiveness of interactive IR systems. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  1. A technique to improve the spelling suggestion rank in medical queries.

    PubMed

    Crowell, Jonathan B; Zeng, Qing T; Kogan, Sandra

    2003-01-01

    Correct spelling is crucial for online search engines to function well, and health information is highly sought after online. We propose a technique for increasing the effectiveness of spell-checking tools for use with medical queries. Our results show a marked improvement in the ranking of the correct term within the suggestion list returned by the spelling correction tool, as well as a lessening of the drawbacks associated with using larger dictionaries.

  2. A Technique to Improve the Spelling Suggestion Rank in Medical Queries

    PubMed Central

    Crowell, Jonathan; Zeng, Qing T.; Kogan, Sandra

    2003-01-01

    Correct spelling is crucial for online search engines to function well, and health information is highly sought after online. We propose a technique for increasing the effectiveness of spell-checking tools for use with medical queries. Our results show a marked improvement in the ranking of the correct term within the suggestion list returned by the spelling correction tool, as well as a lessening of the drawbacks associated with using larger dictionaries. PMID:14728328

  3. Query-Based Outlier Detection in Heterogeneous Information Networks

    PubMed Central

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-01-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397

  4. Multidimensional indexing structure for use with linear optimization queries

    NASA Technical Reports Server (NTRS)

    Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)

    2002-01-01

    Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.

  5. Effective Structured Query Formulation for Session Search

    DTIC Science & Technology

    2012-11-01

    with top frequency are “type of paralysi”, “quadriplegia paraplegia ”, “ paraplegia ”, “spinal cord injury”, and “quadriplegic tetraplegic”, so the final...quadriplegia paraplegia ) 0.004819 paraplegia 0.004819 #combine(spinal cord injury) 0.00241 #combine(quadriplegic tetraplegic) )”, where the

  6. PPISEARCHENGINE: gene ontology-based search for protein-protein interactions.

    PubMed

    Park, Byungkyu; Cui, Guangyu; Lee, Hyunjin; Huang, De-Shuang; Han, Kyungsook

    2013-01-01

    This paper presents a new search engine called PPISearchEngine which finds protein-protein interactions (PPIs) using the gene ontology (GO) and the biological relations of proteins. For efficient retrieval of PPIs, each GO term is assigned a prime number and the relation between the terms is represented by the product of prime numbers. This representation is hidden from users but facilitates the search for the interactions of a query protein by unique prime factorisation of the number that represents the query protein. For a query protein, PPISearchEngine considers not only the GO term associated with the query protein but also the GO terms at the lower level than the GO term in the GO hierarchy, and finds all the interactions of the query protein which satisfy the search condition. In contrast, the standard keyword-matching or ID-matching search method cannot find the interactions of a protein unless the interactions involve a protein with explicit annotations. To the best of our knowledge, this search engine is the first method that can process queries like 'for protein p with GO [Formula: see text], find p's interaction partners with GO [Formula: see text]'. PPISearchEngine is freely available to academics at http://search.hpid.org/.

  7. A Query Integrator and Manager for the Query Web

    PubMed Central

    Brinkley, James F.; Detwiler, Landon T.

    2012-01-01

    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831

  8. A query integrator and manager for the query web.

    PubMed

    Brinkley, James F; Detwiler, Landon T

    2012-10-01

    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. Copyright © 2012 Elsevier Inc. All rights reserved.

  9. Statistical search on the Semantic Web.

    PubMed

    Kobayashi, Norio; Toyoda, Tetsuro

    2008-04-01

    Statistical analysis of links on the Semantic Web is important for various evaluation purposes such as quantifying an individual's scientific research output based on citation links. SPARQL has been proposed as a standardized query language for the Semantic Web and is intuitively understandable; however, it does not adequately support statistical evaluation of semantic links. We have extended SPARQL to a novel Resource Description Framework (RDF) query language termed General and Rapid Association Study Query Language (GRASQL) to generate inferences connecting semantic Boolean-based deduction and statistical evaluation of RDF resources. We have verified the descriptive capability of GRASQL by writing GRASQL queries for practical biomedical search patterns including in silico positional cloning studies and for ranking researchers in a specific domain of expertise by introducing k index, the number of papers containing specific keywords that are published in a fixed period by a researcher. We have also developed a search engine termed General and Rapid Association Study Engine (GRASE), which executes a restricted variety of GRASQL queries by requesting a dynamic and comprehensive evaluation of statistical significance of intersections between each group of documents assigned to URIs and those documents matching user-specified keywords and omics conditions. By performing practical in silico positional cloning searches with GRASE, we show the relevance of our approach on the Semantic Web for biomedical knowledge discovery problem solving. GRASE is used as the search engine for the Positional Medline (PosMed) service and Researcher Finder service at http://omicspace.riken.jp/.

  10. A Semantic Graph Query Language

    SciTech Connect

    Kaplan, I L

    2006-10-16

    Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.

  11. An adaptive random search for short term generation scheduling with network constraints.

    PubMed

    Marmolejo, J A; Velasco, Jonás; Selley, Héctor J

    2017-01-01

    This paper presents an adaptive random search approach to address a short term generation scheduling with network constraints, which determines the startup and shutdown schedules of thermal units over a given planning horizon. In this model, we consider the transmission network through capacity limits and line losses. The mathematical model is stated in the form of a Mixed Integer Non Linear Problem with binary variables. The proposed heuristic is a population-based method that generates a set of new potential solutions via a random search strategy. The random search is based on the Markov Chain Monte Carlo method. The main key of the proposed method is that the noise level of the random search is adaptively controlled in order to exploring and exploiting the entire search space. In order to improve the solutions, we consider coupling a local search into random search process. Several test systems are presented to evaluate the performance of the proposed heuristic. We use a commercial optimizer to compare the quality of the solutions provided by the proposed method. The solution of the proposed algorithm showed a significant reduction in computational effort with respect to the full-scale outer approximation commercial solver. Numerical results show the potential and robustness of our approach.

  12. An adaptive random search for short term generation scheduling with network constraints

    PubMed Central

    Velasco, Jonás; Selley, Héctor J.

    2017-01-01

    This paper presents an adaptive random search approach to address a short term generation scheduling with network constraints, which determines the startup and shutdown schedules of thermal units over a given planning horizon. In this model, we consider the transmission network through capacity limits and line losses. The mathematical model is stated in the form of a Mixed Integer Non Linear Problem with binary variables. The proposed heuristic is a population-based method that generates a set of new potential solutions via a random search strategy. The random search is based on the Markov Chain Monte Carlo method. The main key of the proposed method is that the noise level of the random search is adaptively controlled in order to exploring and exploiting the entire search space. In order to improve the solutions, we consider coupling a local search into random search process. Several test systems are presented to evaluate the performance of the proposed heuristic. We use a commercial optimizer to compare the quality of the solutions provided by the proposed method. The solution of the proposed algorithm showed a significant reduction in computational effort with respect to the full-scale outer approximation commercial solver. Numerical results show the potential and robustness of our approach. PMID:28234954

  13. SPARQL assist language-neutral query composer.

    PubMed

    McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark

    2012-01-25

    SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.

  14. SPARQL Assist language-neutral query composer

    PubMed Central

    2012-01-01

    Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327

  15. Internet search term affects the quality and accuracy of online information about developmental hip dysplasia.

    PubMed

    Fabricant, Peter D; Dy, Christopher J; Patel, Ronak M; Blanco, John S; Doyle, Shevaun M

    2013-06-01

    The recent emphasis on shared decision-making has increased the role of the Internet as a readily accessible medical reference source for patients and families. However, the lack of professional review creates concern over the quality, accuracy, and readability of medical information available to patients on the Internet. Three Internet search engines (Google, Yahoo, and Bing) were evaluated prospectively using 3 difference search terms of varying sophistication ("congenital hip dislocation," "developmental dysplasia of the hip," and "hip dysplasia in children"). Sixty-three unique Web sites were evaluated by each of 3 surgeons (2 fellowship-trained pediatric orthopaedic attendings and 1 orthopaedic chief resident) for quality and accuracy using a set of scoring criteria based on the AAOS/POSNA patient education Web site. The readability (literacy grade level) of each Web site was assessed using the Fleisch-Kincaid score. There were significant differences noted in quality, accuracy, and readability of information depending on the search term used. The search term "developmental dysplasia of the hip" provided higher quality and accuracy compared with the search term "congenital hip dislocation." Of the 63 total Web sites, 1 (1.6%) was below the sixth grade reading level recommended by the NIH for health education materials and 8 (12.7%) Web sites were below the average American reading level (eighth grade). The quality and accuracy of information available on the Internet regarding developmental hip dysplasia significantly varied with the search term used. Patients seeking information about DDH on the Internet may not understand the materials found because nearly all of the Web sites are written at a level above that recommended for publically distributed health information. Physicians should advise their patients to search for information using the term "developmental dysplasia of the hip" or, better yet, should refer patients to Web sites that they have

  16. Motivation and short-term memory in visual search: Attention's accelerator revisited.

    PubMed

    Schneider, Daniel; Bonmassar, Claudia; Hickey, Clayton

    2017-07-13

    A cue indicating the possibility of cash reward will cause participants to perform memory-based visual search more efficiently. A recent study has suggested that this performance benefit might reflect the use of multiple memory systems: when needed, participants may maintain the to-be-remembered object in both long-term and short-term visual memory, with this redundancy benefitting target identification during search (Reinhart, McClenahan & Woodman, 2016). Here we test this compelling hypothesis. We had participants complete a memory-based visual search task involving a reward cue that either preceded presentation of the to-be-remembered target (pre-cue) or followed it (retro-cue). Following earlier work, we tracked memory representation using two components of the event-related potential (ERP): the contralateral delay activity (CDA), reflecting short-term visual memory, and the anterior P170, reflecting long-term storage. We additionally tracked attentional preparation and deployment in the contingent negative variation (CNV) and N2pc, respectively. Results show that only the reward pre-cue impacted our ERP indices of memory. However, both types of cue elicited a robust CNV, reflecting an influence on task preparation, both had equivalent impact on deployment of attention to the target, as indexed in the N2pc, and both had equivalent impact on visual search behavior. Reward prospect thus has an influence on memory-guided visual search, but this does not appear to be necessarily mediated by a change in the visual memory representations indexed by CDA. Our results demonstrate that the impact of motivation on search is not a simple product of improved memory for target templates. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Virtual Solar Observatory Distributed Query Construction

    NASA Technical Reports Server (NTRS)

    Gurman, J. B.; Dimitoglou, G.; Bogart, R.; Davey, A.; Hill, F.; Martens, P.

    2003-01-01

    Through a prototype implementation (Tian et al., this meeting) the VSO has already demonstrated the capability of unifying geographically distributed data sources following the Web Services paradigm and utilizing mechanisms such as the Simple Object Access Protocol (SOAP). So far, four participating sites (Stanford, Montana State University, National Solar Observatory and the Solar Data Analysis Center) permit Web-accessible, time-based searches that allow browse access to a number of diverse data sets. Our latest work includes the extension of the simple, time-based queries to include numerous other searchable observation parameters. For VSO users, this extended functionality enables more refined searches. For the VSO, it is a proof of concept that more complex, distributed queries can be effectively constructed and that results from heterogeneous, remote sources can be synthesized and presented to users as a single, virtual data product.

  18. Using Advanced Search Operators on Web Search Engines.

    ERIC Educational Resources Information Center

    Jansen, Bernard J.

    Studies show that the majority of Web searchers enter extremely simple queries, so a reasonable system design approach would be to build search engines to compensate for this user characteristic. One hundred representative queries were selected from the transaction log of a major Web search service. These 100 queries were then modified using the…

  19. Queries for Bias Testing

    NASA Technical Reports Server (NTRS)

    Gordon, Diana F.

    1992-01-01

    Selecting a good bias prior to concept learning can be difficult. Therefore, dynamic bias adjustment is becoming increasingly popular. Current dynamic bias adjustment systems, however, are limited in their ability to identify erroneous assumptions about the relationship between the bias and the target concept. Without proper diagnosis, it is difficult to identify and then remedy faulty assumptions. We have developed an approach that makes these assumptions explicit, actively tests them with queries to an oracle, and adjusts the bias based on the test results.

  20. Robust Quantum Private Queries

    NASA Astrophysics Data System (ADS)

    Wang, Tian-Yin; Wang, Shu-Yu; Ma, Jian-Feng

    2016-07-01

    We propose a new quantum private query protocol with the technique of decoherence-free states, which is a theoretical study of how decoherence-free states can be used for the protection of quantum information in such a protocol. This protocol can solve the noise problem that will make the user obtain a wrong answer and hence give rise to a bad influence on the reputation of the database provider. Furthermore, this protocol is also flexible, loss-resistant and easily generalized to a large database similar to the previous works.

  1. CUFID-query: accurate network querying through random walk based network flow estimation.

    PubMed

    Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

    2017-12-28

    performance evaluation based on biological networks with known functional modules, we show that CUFID-query outperforms the existing state-of-the-art algorithms in terms of prediction accuracy and biological significance of the predictions.

  2. An ontology-based search engine for protein-protein interactions

    PubMed Central

    2010-01-01

    Background Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. Results We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Conclusion Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology. PMID:20122195

  3. An Exemplar-Familiarity Model Predicts Short-Term and Long-Term Probe Recognition across Diverse Forms of Memory Search

    ERIC Educational Resources Information Center

    Nosofsky, Robert M.; Cox, Gregory E.; Cao, Rui; Shiffrin, Richard M.

    2014-01-01

    Experiments were conducted to test a modern exemplar-familiarity model on its ability to account for both short-term and long-term probe recognition within the same memory-search paradigm. Also, making connections to the literature on attention and visual search, the model was used to interpret differences in probe-recognition performance across…

  4. Effect of long-term intensity variations on pulsar searches and the pulsar luminosity function

    NASA Technical Reports Server (NTRS)

    Krishnamohan, S.

    1981-01-01

    Long-term intensity data for five pulsars are used to obtain the probability density distribution of intensities for each pulsar, and it is found that they are described satisfactorily by chi-squared distributions. Based on these distributions, the number of new pulsars expected to be found on repeatedly searching the same region of the sky with the same sensitivity is given. Nearly 25 percent more new pulsars are expected to be found on the first repeat search. It is also shown that the luminosity function deduced from either a single survey or surveys with very different sensitivities is not affected by the omission of flux density variations in the calculation of selection effects. Finally, a method is proposed for deriving the luminosity function by combining the different searches of a given area on the basis of a probabilistic approach to the evaluation of selection effects.

  5. A Graphical Query Language for Querying Petri Nets

    NASA Astrophysics Data System (ADS)

    Xiao, Lan; Zheng, Li; Xiao, Jian; Huang, Yi

    As the number of business process models increases, providing business analysts and IT experts with a query langue for querying business process models is of great practical value. This paper uses Petri net as business process modeling language and develops Petri Net Query Language (PNQL), a graphical query language for Petri nets. The syntax and semantics of PNQL are formally studied. PNQL allows users to get not only the perfectly matched Petri nets but also the Petri nets with high similarity. The complexity of PNQL is studied.

  6. Space Object Query Tool

    NASA Technical Reports Server (NTRS)

    Phillips, Veronica J.

    2017-01-01

    STI is for a fact sheet on the Space Object Query Tool being created by the MDC. When planning launches, NASA must first factor in the tens of thousands of objects already in orbit around the Earth. The number of human-made objects, including nonfunctional spacecraft, abandoned launch vehicle stages, mission-related debris and fragmentation debris orbiting Earth has grown steadily since Sputnik 1 was launched in 1957. Currently, the U.S. Department of Defenses Joint Space Operations Center, or JSpOC, tracks over 15,000 distinct objects and provides data for more than 40,000 objects via its Space-Track program, found at space-track.org.

  7. Study of query expansion techniques and their application in the biomedical information retrieval.

    PubMed

    Rivas, A R; Iglesias, E L; Borrajo, L

    2014-01-01

    Information Retrieval focuses on finding documents whose content matches with a user query from a large document collection. As formulating well-designed queries is difficult for most users, it is necessary to use query expansion to retrieve relevant information. Query expansion techniques are widely applied for improving the efficiency of the textual information retrieval systems. These techniques help to overcome vocabulary mismatch issues by expanding the original query with additional relevant terms and reweighting the terms in the expanded query. In this paper, different text preprocessing and query expansion approaches are combined to improve the documents initially retrieved by a query in a scientific documental database. A corpus belonging to MEDLINE, called Cystic Fibrosis, is used as a knowledge source. Experimental results show that the proposed combinations of techniques greatly enhance the efficiency obtained by traditional queries.

  8. Distance-Constraint k-Nearest Neighbor Searching in Mobile Sensor Networks

    PubMed Central

    Han, Yongkoo; Park, Kisung; Hong, Jihye; Ulamin, Noor; Lee, Young-Koo

    2015-01-01

    The k-Nearest Neighbors (kNN) query is an important spatial query in mobile sensor networks. In this work we extend kNN to include a distance constraint, calling it a l-distant k-nearest-neighbors (l-kNN) query, which finds the k sensor nodes nearest to a query point that are also at l or greater distance from each other. The query results indicate the objects nearest to the area of interest that are scattered from each other by at least distance l. The l- kNN query can be used in most kNN applications for the case of well distributed query results. To process an l-kNN query, we must discover all sets of kNN sensor nodes and then find all pairs of sensor nodes in each set that are separated by at least a distance l. Given the limited battery and computing power of sensor nodes, this l-kNN query processing is problematically expensive in terms of energy consumption. In this paper, we propose a greedy approach for l- kNN query processing in mobile sensor networks. The key idea of the proposed approach is to divide the search space into subspaces whose all sides are l. By selecting k sensor nodes from the other subspaces near the query point, we guarantee accurate query results for l- kNN. In our experiments, we show that the proposed method exhibits superior performance compared with a post-processing based method using the kNN query in terms of energy efficiency, query latency, and accuracy. PMID:26225969

  9. VISAGE: Interactive Visual Graph Querying.

    PubMed

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2016-06-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.

  10. VISAGE: Interactive Visual Graph Querying

    PubMed Central

    Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

    2017-01-01

    Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670

  11. The Development of Automaticity in Short-Term Memory Search: Item-Response Learning and Category Learning

    ERIC Educational Resources Information Center

    Cao, Rui; Nosofsky, Robert M.; Shiffrin, Richard M.

    2017-01-01

    In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across…

  12. Query Expansion and Query Translation as Logical Inference.

    ERIC Educational Resources Information Center

    Nie, Jian-Yun

    2003-01-01

    Examines query expansion during query translation in cross language information retrieval and develops a general framework for inferential information retrieval in two particular contexts: using fuzzy logic and probability theory. Obtains evaluation formulas that are shown to strongly correspond to those used in other information retrieval models.…

  13. The role of economics in the QUERI program: QUERI Series.

    PubMed

    Smith, Mark W; Barnett, Paul G

    2008-04-22

    The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.

  14. Cohort Discovery Query Optimization via Computable Controlled Vocabulary Versioning.

    PubMed

    Ferris, Todd A; Podchiyska, Tanya

    2015-01-01

    Self-service cohort discovery tools strive to provide intuitive interfaces to large Clinical Data Warehouses that contain extensive historic information. In those tools, controlled vocabulary (e.g., ICD-9-CM, CPT) coded clinical information is often the main search criteria used because of its ubiquity in billing processes. These tools generally require a researcher to pick specific terms from the controlled vocabulary. However, controlled vocabularies evolve over time as medical knowledge changes and can even be replaced with new versions (e.g., ICD-9 to ICD-10). These tools generally only display the current version of the controlled vocabulary. Researchers should not be expected to understand the underlying controlled vocabulary versioning issues. We propose a computable controlled vocabulary versioning system that allows cohort discovery tools to automatically expand queries to account for terminology changes.

  15. Query-by-example surgical activity detection.

    PubMed

    Gao, Yixin; Vedula, S Swaroop; Lee, Gyusung I; Lee, Mija R; Khudanpur, Sanjeev; Hager, Gregory D

    2016-06-01

    Easy acquisition of surgical data opens many opportunities to automate skill evaluation and teaching. Current technology to search tool motion data for surgical activity segments of interest is limited by the need for manual pre-processing, which can be prohibitive at scale. We developed a content-based information retrieval method, query-by-example (QBE), to automatically detect activity segments within surgical data recordings of long duration that match a query. The example segment of interest (query) and the surgical data recording (target trial) are time series of kinematics. Our approach includes an unsupervised feature learning module using a stacked denoising autoencoder (SDAE), two scoring modules based on asymmetric subsequence dynamic time warping (AS-DTW) and template matching, respectively, and a detection module. A distance matrix of the query against the trial is computed using the SDAE features, followed by AS-DTW combined with template scoring, to generate a ranked list of candidate subsequences (substrings). To evaluate the quality of the ranked list against the ground-truth, thresholding conventional DTW distances and bipartite matching are applied. We computed the recall, precision, F1-score, and a Jaccard index-based score on three experimental setups. We evaluated our QBE method using a suture throw maneuver as the query, on two tool motion datasets (JIGSAWS and MISTIC-SL) captured in a training laboratory. We observed a recall of 93, 90 and 87 % and a precision of 93, 91, and 88 % with same surgeon same trial (SSST), same surgeon different trial (SSDT) and different surgeon (DS) experiment setups on JIGSAWS, and a recall of 87, 81 and 75 % and a precision of 72, 61, and 53 % with SSST, SSDT and DS experiment setups on MISTIC-SL, respectively. We developed a novel, content-based information retrieval method to automatically detect multiple instances of an activity within long surgical recordings. Our method demonstrated adequate recall

  16. The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries

    PubMed Central

    Côté, Richard G; Jones, Philip; Apweiler, Rolf; Hermjakob, Henning

    2006-01-01

    Background With the vast amounts of biomedical data being generated by high-throughput analysis methods, controlled vocabularies and ontologies are becoming increasingly important to annotate units of information for ease of search and retrieval. Each scientific community tends to create its own locally available ontology. The interfaces to query these ontologies tend to vary from group to group. We saw the need for a centralized location to perform controlled vocabulary queries that would offer both a lightweight web-accessible user interface as well as a consistent, unified SOAP interface for automated queries. Results The Ontology Lookup Service (OLS) was created to integrate publicly available biomedical ontologies into a single database. All modified ontologies are updated daily. A list of currently loaded ontologies is available online. The database can be queried to obtain information on a single term or to browse a complete ontology using AJAX. Auto-completion provides a user-friendly search mechanism. An AJAX-based ontology viewer is available to browse a complete ontology or subsets of it. A programmatic interface is available to query the webservice using SOAP. The service is described by a WSDL descriptor file available online. A sample Java client to connect to the webservice using SOAP is available for download from SourceForge. All OLS source code is publicly available under the open source Apache Licence. Conclusion The OLS provides a user-friendly single entry point for publicly available ontologies in the Open Biomedical Ontology (OBO) format. It can be accessed interactively or programmatically at . PMID:16507094

  17. The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries.

    PubMed

    Côté, Richard G; Jones, Philip; Apweiler, Rolf; Hermjakob, Henning

    2006-02-28

    With the vast amounts of biomedical data being generated by high-throughput analysis methods, controlled vocabularies and ontologies are becoming increasingly important to annotate units of information for ease of search and retrieval. Each scientific community tends to create its own locally available ontology. The interfaces to query these ontologies tend to vary from group to group. We saw the need for a centralized location to perform controlled vocabulary queries that would offer both a lightweight web-accessible user interface as well as a consistent, unified SOAP interface for automated queries. The Ontology Lookup Service (OLS) was created to integrate publicly available biomedical ontologies into a single database. All modified ontologies are updated daily. A list of currently loaded ontologies is available online. The database can be queried to obtain information on a single term or to browse a complete ontology using AJAX. Auto-completion provides a user-friendly search mechanism. An AJAX-based ontology viewer is available to browse a complete ontology or subsets of it. A programmatic interface is available to query the webservice using SOAP. The service is described by a WSDL descriptor file available online. A sample Java client to connect to the webservice using SOAP is available for download from SourceForge. All OLS source code is publicly available under the open source Apache Licence. The OLS provides a user-friendly single entry point for publicly available ontologies in the Open Biomedical Ontology (OBO) format. It can be accessed interactively or programmatically at http://www.ebi.ac.uk/ontology-lookup/.

  18. PhenoImageShare: an image annotation and query infrastructure.

    PubMed

    Adebayo, Solomon; McLeod, Kenneth; Tudose, Ilinca; Osumi-Sutherland, David; Burdett, Tony; Baldock, Richard; Burger, Albert; Parkinson, Helen

    2016-06-07

    High throughput imaging is now available to many groups and it is possible to generate a large quantity of high quality images quickly. Managing this data, consistently annotating it, or making it available to the community are all challenges that come with these methods. PhenoImageShare provides an ontology-enabled lightweight image data query, annotation service and a single point of access backed by a Solr server for programmatic access to an integrated image collection enabling improved community access. PhenoImageShare also provides an easy to use online image annotation tool with functionality to draw regions of interest on images and to annotate them with terms from an autosuggest-enabled ontology-lookup widget. The provenance of each image, and annotation, is kept and links to original resources are provided. The semantic and intuitive search interface is species and imaging technology neutral. PhenoImageShare now provides access to annotation for over 100,000 images for 2 species. The PhenoImageShare platform provides underlying infrastructure for both programmatic access and user-facing tools for biologists enabling the query and annotation of federated images. PhenoImageShare is accessible online at http://www.phenoimageshare.org .

  19. Querying Proofs (Work in Progress)

    NASA Technical Reports Server (NTRS)

    Aspinall, David; Denney, Ewen; Lueth, Christoph

    2011-01-01

    We motivate and introduce the basis for a query language designed for inspecting electronic representations of proofs. We argue that there is much to learn from large proofs beyond their validity, and that a dedicated query language can provide a principled way of implementing a family of useful operations.

  20. Essie: a concept-based search engine for structured biomedical text.

    PubMed

    Ide, Nicholas C; Loane, Russell F; Demner-Fushman, Dina

    2007-01-01

    This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie's design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie's performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain.

  1. Sundanese ancient manuscripts search engine using probability approach

    NASA Astrophysics Data System (ADS)

    Suryani, Mira; Hadi, Setiawan; Paulus, Erick; Nurma Yulita, Intan; Supriatna, Asep K.

    2017-10-01

    Today, Information and Communication Technology (ICT) has become a regular thing for every aspect of live include cultural and heritage aspect. Sundanese ancient manuscripts as Sundanese heritage are in damage condition and also the information that containing on it. So in order to preserve the information in Sundanese ancient manuscripts and make them easier to search, a search engine has been developed. The search engine must has good computing ability. In order to get the best computation in developed search engine, three types of probabilistic approaches: Bayesian Networks Model, Divergence from Randomness with PL2 distribution, and DFR-PL2F as derivative form DFR-PL2 have been compared in this study. The three probabilistic approaches supported by index of documents and three different weighting methods: term occurrence, term frequency, and TF-IDF. The experiment involved 12 Sundanese ancient manuscripts. From 12 manuscripts there are 474 distinct terms. The developed search engine tested by 50 random queries for three types of query. The experiment results showed that for the single query and multiple query, the best searching performance given by the combination of PL2F approach and TF-IDF weighting method. The performance has been evaluated using average time responds with value about 0.08 second and Mean Average Precision (MAP) about 0.33.

  2. Sensitivity and predictive value of 15 PubMed search strategies to answer clinical questions rated against full systematic reviews.

    PubMed

    Agoritsas, Thomas; Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

    2012-06-12

    Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed's Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%-25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%-30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help clinicians apply effective strategies to answer their

  3. Sensitivity and Predictive Value of 15 PubMed Search Strategies to Answer Clinical Questions Rated Against Full Systematic Reviews

    PubMed Central

    Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

    2012-01-01

    Background Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. Objective To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. Methods We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed’s Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. Results The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%–25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%–30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. Conclusions The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help

  4. In-context query reformulation for failing SPARQL queries

    NASA Astrophysics Data System (ADS)

    Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James

    2017-05-01

    Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.

  5. Image query based on color harmony

    NASA Astrophysics Data System (ADS)

    Vasile, Alexandru; Bender, Walter R.

    2001-06-01

    The combination of the increased size of digital image databases and the increased frequency with which non- specialist access these databases is raising the question of the efficacy of visual search and retrieval tools. We hypothesize that the use of color harmony has the potential for improving image-search efficiency. We describe an image- retrieval algorithm that relies on a color harmony model. This mode, built on Munsell hue, value, and chroma contrast, is used to divide the image database into clusters that can be individually searched. To test the efficacy of the algorithm, it is compared to existing algorithms developed by Niblack et al and Feldman et al. A second study that utilizes the image query system in a retail application is also described.

  6. Does the volume of Internet searches using suicide-related search terms influence the suicide death rate: data from 2004 to 2009 in Japan.

    PubMed

    Sueki, Hajime

    2011-06-01

    Cross-correlation was examined for the volume of suicide-related Internet searches and suicide death rate. Analysis of Google data and figures released by the Ministry of Health, Labour, and Welfare indicated that the volume of searches using the search terms jisatsu (suicide) and jisatsu houhou (suicide method) are not correlated with the suicide death rate. In addition, a rising suicide death rate might be related to the increase in suicide-related search activity (particularly utsu[depression]), but an increase in suicide-related search activity itself is not directly linked to the rise of suicide death rate. © 2011 The Author. Psychiatry and Clinical Neurosciences © 2011 Japanese Society of Psychiatry and Neurology.

  7. Exploring the Query Expansion Methods for Concept Based Representation

    DTIC Science & Technology

    2014-11-01

    documents from term based representation to concept based representation. We then utilized the Cases Database and UMLS relations to expand the key...same query expansion techniques to the term based representation. The results show that using the UMLS relation could help to improve performance. 2...www.casesdatabase.com/. Currently closed. Fig. 2. An example case report on Cases Database. 2.2 Query expansion with UMLS relationships Concepts are

  8. Exploring Contextual Models in Chemical Patent Search

    NASA Astrophysics Data System (ADS)

    Urbain, Jay; Frieder, Ophir

    We explore the development of probabilistic retrieval models for integrating term statistics with entity search using multiple levels of document context to improve the performance of chemical patent search. A distributed indexing model was developed to enable efficient named entity search and aggregation of term statistics at multiple levels of patent structure including individual words, sentences, claims, descriptions, abstracts, and titles. The system can be scaled to an arbitrary number of compute instances in a cloud computing environment to support concurrent indexing and query processing operations on large patent collections.

  9. What's trending now? An analysis of trends in internet searches for labor epidurals.

    PubMed

    Sutton, C D; Carvalho, B

    2017-05-01

    The study aim was to investigate internet use for obtaining information about epidurals for labor and delivery. Google Trends for US data was queried from 2004 to 2015 to find the most common searches and determine temporal trends. The Google Trends query used the term [epidural] and evaluated changes in search trends over time. Search comparisons were made for each year from 2004 to 2015, and three equal time epochs during the study period (2004-07, 2008-11, 2012-15) were compared. We also compared searches for epidurals with commonly searched birth-related terms. Internet searches are increasing; there were 726000 searches for [epidural] in 2015. Search terms with the most significant growth in the past 4years (2012-15) were "birth with epidural," "pain after epidural," "labor without epidural," "epidural birth video," and "epidural vs natural". Searches for epidural side effects, risks, and pain on insertion were among the most common and were increasing most rapidly. Searches related to epidurals were more common than searches related to "natural births", "home births", and "labor pain", but were less common than searches for "midwives" or "doulas". The findings provide an insight into internet use by those seeking information about labor analgesic options. Identifying the most common and rapidly increasing online search queries may guide physician-parturient interactions and online content creation, to address labor analgesic topics that most interest users. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Summarization of Text Document Using Query Dependent Parsing Techniques

    NASA Astrophysics Data System (ADS)

    Rokade, P. P.; Mrunal, Bewoor; Patil, S. H.

    2010-11-01

    World Wide Web is the largest source of information. Huge amount of data is present on the Web. There has been a great amount of work on query-independent summarization of documents. However, due to the success of Web search engines query-specific document summarization (query result snippets) has become an important problem. In this paper a method to create query specific summaries by identifying the most query-relevant fragments and combining them using the semantic associations within the document is discussed. In particular, first a structure is added to the documents in the preprocessing stage and converts them to document graphs. The present research work focuses on analytical study of different document clustering and summarization techniques currently the most research is focused on Query-Independent summarization. The main aim of this research work is to combine the both approaches of document clustering and query dependent summarization. This mainly includes applying different clustering algorithms on a text document. Create a weighted document graph of the resulting graph based on the keywords. And obtain the document graph to get the summary of the document. The performance of the summary using different clustering techniques will be analyzed and the optimal approach will be suggested.

  11. QNet: a tool for querying protein interaction networks.

    PubMed

    Dost, Banu; Shlomi, Tomer; Gupta, Nitin; Ruppin, Eytan; Bafna, Vineet; Sharan, Roded

    2008-09-01

    Molecular interaction databases can be used to study the evolution of molecular pathways across species. Querying such pathways is a challenging computational problem, and recent efforts have been limited to simple queries (paths), or simple networks (forests). In this paper, we significantly extend the class of pathways that can be efficiently queried to the case of trees, and graphs of bounded treewidth. Our algorithm allows the identification of non-exact (homeomorphic) matches, exploiting the color coding technique of Alon et al. (1995). We implement a tool for tree queries, called QNet, and test its retrieval properties in simulations and on real network data. We show that QNet searches queries with up to nine proteins in seconds on current networks, and outperforms sequence-based searches. We also use QNet to perform the first large-scale cross-species comparison of protein complexes, by querying known yeast complexes against a fly protein interaction network. This comparison points to strong conservation between the two species, and underscores the importance of our tool in mining protein interaction networks.

  12. Mapping Self-Guided Learners' Searches for Video Tutorials on YouTube

    ERIC Educational Resources Information Center

    Garrett, Nathan

    2016-01-01

    While YouTube has a wealth of educational videos, how self-guided learners use these resources has not been fully described. An analysis of search engine queries for help with the use of Microsoft Excel shows that few users search for specific features or functions but instead use very general terms. Because the same videos are returned in…

  13. Long-term retention of skilled visual search following severe traumatic brain injury

    PubMed Central

    PAVAWALLA, SHITAL P.; SCHMITTER-EDGECOMBE, MAUREEN

    2007-01-01

    We examined the long-term retention of a learned automatic cognitive process in 17 severe TBI participants and 10 controls. Participants had initially received extensive consistent-mapping (CM) training (i.e., 3600 trials) in a semantic category visual search task (Schmitter-Edgecombe & Beglinger, 2001). Following CM training, TBI and control groups demonstrated dramatic performance improvements and the development of an automatic attention response (AAR), indicating task-specific and stimulus-specific skill learning. After a 5- or 10-month retention interval, participants in this study performed a New CM task and the originally trained CM task to assess for retention of task-specific and stimulus-specific visual search skills, respectively. No significant group differences were found in the level of retention for either skill type, indicating that individuals with severe TBI were able to retain the learned skills over a long-term retention interval at a level comparable to controls. Exploratory analyses revealed that TBI participants who returned at the 5-month retention interval showed nearly complete skill retention, and greater skill retention than TBI participants who returned at the 10-month interval, suggesting that “booster” or retraining sessions may be needed when a skill is not continuously in use. PMID:17064444

  14. The effect of search term on the quality and accuracy of online information regarding distal radius fractures.

    PubMed

    Dy, Christopher J; Taylor, Samuel A; Patel, Ronak M; Kitay, Alison; Roberts, Timothy R; Daluiski, Aaron

    2012-09-01

    Recent emphasis on shared decision making and patient-centered research has increased the importance of patient education and health literacy. The internet is rapidly growing as a source of self-education for patients. However, concern exists over the quality, accuracy, and readability of the information. Our objective was to determine whether the quality, accuracy, and readability of information online about distal radius fractures vary with the search term. This was a prospective evaluation of 3 search engines using 3 different search terms of varying sophistication ("distal radius fracture," "wrist fracture," and "broken wrist"). We evaluated 70 unique Web sites for quality, accuracy, and readability. We used comparative statistics to determine whether the search term affected the quality, accuracy, and readability of the Web sites found. Three orthopedic surgeons independently gauged quality and accuracy of information using a set of predetermined scoring criteria. We evaluated the readability of the Web site using the Fleisch-Kincaid score for reading grade level. There were significant differences in the quality, accuracy, and readability of information found, depending on the search term. We found higher quality and accuracy resulted from the search term "distal radius fracture," particularly compared with Web sites resulting from the term "broken wrist." The reading level was higher than recommended in 65 of the 70 Web sites and was significantly higher when searching with "distal radius fracture" than "wrist fracture" or "broken wrist." There was no correlation between Web site reading level and quality or accuracy. The readability of information about distal radius fractures in most Web sites was higher than the recommended reading level for the general public. The quality and accuracy of the information found significantly varied with the sophistication of the search term used. Physicians, professional societies, and search engines should consider

  15. Adaptive shape transform for color image querying

    NASA Astrophysics Data System (ADS)

    Celenk, Mehmet; Zhou, Qiang; Vetnes, Vermund; Godavari, Rakesh K.

    2003-05-01

    Spectral (color) and spatial (shape) features available in pictures are sources of information that need to be incorporated for advance content-based image database retrieval. The adaptive shape transform approach developed in this research is originated from the premise that a two-dimensional (2D) shape can be recovered completely from a set of the orthogonal Radon transform-based projections. For search consistency, it is necessary to identify the region(s) of interest (ROI) before applying the Radon transform to shape query. ROI"s are detected automatically by means of saliency map-based segmentation. The Radon transform packs the shape information of a 2D mess along the projection axis of known orientation, and generates a series of one-dimensional (1D) functions from color channels for projection angles ranging from 1° to 180°. The optimal number of projections for a particular shape is determined by imposing the Kullback-Leibler distance (KLD) histogram comparison as the similarity metric between the query and database images. The Radon transforms with the shortest and longest lengths yield the most distinctive shape attributes for the object classes being queried. For translation- and rotation-invariant retrieval, the principal component analysis is utilized as the preprocessing tool in the spatial plane. Size invariance is achieved by normalizing the Radon transforms in the (R, G, B) color channels independently. The proposed algorithm was tested on a wide range of complex shaped objects imaged in 24-bit color with different spatial resolutions. The KLDs between two images are calculated in the longest and shortest directions of the Radon transform, and then are added together to find the similarity measure corresponding to the query and database pictures. Higher measures indicate two dissimilar shapes, while smaller values represent two similar ones. Experimental results show that the method is robust and accounts for high noise immunity.

  16. Intelligent search in Big Data

    NASA Astrophysics Data System (ADS)

    Birialtsev, E.; Bukharaev, N.; Gusenkov, A.

    2017-10-01

    An approach to data integration, aimed on the ontology-based intelligent search in Big Data, is considered in the case when information objects are represented in the form of relational databases (RDB), structurally marked by their schemes. The source of information for constructing an ontology and, later on, the organization of the search are texts in natural language, treated as semi-structured data. For the RDBs, these are comments on the names of tables and their attributes. Formal definition of RDBs integration model in terms of ontologies is given. Within framework of the model universal RDB representation ontology, oil production subject domain ontology and linguistic thesaurus of subject domain language are built. Technique of automatic SQL queries generation for subject domain specialists is proposed. On the base of it, information system for TATNEFT oil-producing company RDBs was implemented. Exploitation of the system showed good relevance with majority of queries.

  17. Compression for Quadratic Similarity Queries

    PubMed Central

    Ingber, Amir; Courtade, Thomas; Weissman, Tsachy

    2017-01-01

    The problem of performing similarity queries on compressed data is considered. We focus on the quadratic similarity measure, and study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on the compressed data. For a Gaussian source, we show that the queries can be answered reliably if and only if the compression rate exceeds a given threshold—the identification rate— which we explicitly characterize. Moreover, when compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively. For a general source, we prove that, as with classical compression, the Gaussian source requires the largest compression rate among sources with a given variance. Moreover, a robust scheme is described that attains this maximal rate for any source distribution. PMID:29375151

  18. Structured Query Language (SQL) fundamentals.

    PubMed

    Jamison, D Curtis

    2003-02-01

    Relational databases provide the most common platform for storing data. The Structured Query Language (SQL) is a powerful tool for interacting with relational database systems. SQL enables the user to concoct complex and powerful queries in a straightforward manner, allowing sophisticated data analysis using simple syntax and structure. This unit demonstrates how to use the MySQL package to build and interact with a relational database.

  19. Issues in the Design of a Pilot Concept-Based Query Interface for the Neuroinformatics Information Framework

    PubMed Central

    Li, Yuli; Martone, Maryann E.; Sternberg, Paul W.; Shepherd, Gordon M.; Miller, Perry L.

    2009-01-01

    This paper describes a pilot query interface that has been constructed to help us explore a “concept-based” approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface. PMID:18953674

  20. Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track

    DTIC Science & Technology

    2014-11-01

    MeSH - indexed databases such as PubMed . However, since many medical conditions may be expressed in varying terminology, a single representation of a...a snapshot of the Open Access Subset of PubMed Central (PMC), containing 733,138 articles in NXML format. We adopt the Lemur Search Engine1 to build...from the text. Each topic was queried on the National Library of Medicine’s (NLM) MeSH on Demand,2 which generates relevant MeSH terms using the

  1. A high performance, ad-hoc, fuzzy query processing system for relational databases

    NASA Technical Reports Server (NTRS)

    Mansfield, William H., Jr.; Fleischman, Robert M.

    1992-01-01

    Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.

  2. Improving Concept-Based Web Image Retrieval by Mixing Semantically Similar Greek Queries

    ERIC Educational Resources Information Center

    Lazarinis, Fotis

    2008-01-01

    Purpose: Image searching is a common activity for web users. Search engines offer image retrieval services based on textual queries. Previous studies have shown that web searching is more demanding when the search is not in English and does not use a Latin-based language. The aim of this paper is to explore the behaviour of the major search…

  3. Boolean versus ranked querying for biomedical systematic reviews

    PubMed Central

    2010-01-01

    Background The process of constructing a systematic review, a document that compiles the published evidence pertaining to a specified medical topic, is intensely time-consuming, often taking a team of researchers over a year, with the identification of relevant published research comprising a substantial portion of the effort. The standard paradigm for this information-seeking task is to use Boolean search; however, this leaves the user(s) the requirement of examining every returned result. Further, our experience is that effective Boolean queries for this specific task are extremely difficult to formulate and typically require multiple iterations of refinement before being finalized. Methods We explore the effectiveness of using ranked retrieval as compared to Boolean querying for the purpose of constructing a systematic review. We conduct a series of experiments involving ranked retrieval, using queries defined methodologically, in an effort to understand the practicalities of incorporating ranked retrieval into the systematic search task. Results Our results show that ranked retrieval by itself is not viable for this search task requiring high recall. However, we describe a refinement of the standard Boolean search process and show that ranking within a Boolean result set can improve the overall search performance by providing early indication of the quality of the results, thereby speeding up the iterative query-refinement process. Conclusions Outcomes of experiments suggest that an interactive query-development process using a hybrid ranked and Boolean retrieval system has the potential for significant time-savings over the current search process in the systematic reviewing. PMID:20937152

  4. UMLS-Query: a perl module for querying the UMLS.

    PubMed

    Shah, Nigam H; Shah, Nigam; Muse, Mark A; Musen, Mark

    2008-11-06

    The Metathesaurus from the Unified Medical Language System (UMLS) is a widely used ontology resource, which is mostly used in a relational database form for terminology research, mapping and information indexing. A significant section of UMLS users use a MySQL installation of the metathesaurus and Perl programming language as their access mechanism. We describe UMLS-Query, a Perl module that provides functions for retrieving concept identifiers, mapping text-phrases to Metathesaurus concepts and graph traversal in the Metathesaurus stored in a MySQL database. UMLS-Query can be used to build applications for semi-automated sample annotation, terminology based browsers for tissue sample databases and for terminology research. We describe the results of such uses of UMLS-Query and present the module for others to use.

  5. A Systematic Search for Short-term Variability of EGRET Sources

    NASA Technical Reports Server (NTRS)

    Wallace, P. M.; Griffis, N. J.; Bertsch, D. L.; Hartman, R. C.; Thompson, D. J.; Kniffen, D. A.; Bloom, S. D.

    2000-01-01

    The 3rd EGRET Catalog of High-energy Gamma-ray Sources contains 170 unidentified sources, and there is great interest in the nature of these sources. One means of determining source class is the study of flux variability on time scales of days; pulsars are believed to be stable on these time scales while blazers are known to be highly variable. In addition, previous work has demonstrated that 3EG J0241-6103 and 3EG J1837-0606 are candidates for a new gamma-ray source class. These sources near the Galactic plane display transient behavior but cannot be associated with any known blazers. Although, many instances of flaring AGN have been reported, the EGRET database has not been systematically searched for occurrences of short-timescale (approximately 1 day) variability. These considerations have led us to conduct a systematic search for short-term variability in EGRET data, covering all viewing periods through proposal cycle 4. Six 3EG catalog sources are reported here to display variability on short time scales; four of them are unidentified. In addition, three non-catalog variable sources are discussed.

  6. BEST: Next-Generation Biomedical Entity Search Tool for Knowledge Discovery from Biomedical Literature.

    PubMed

    Lee, Sunwon; Kim, Donghyeon; Lee, Kyubum; Choi, Jaehoon; Kim, Seongsoon; Jeon, Minji; Lim, Sangrak; Choi, Donghee; Kim, Sunkyu; Tan, Aik-Choon; Kang, Jaewoo

    2016-01-01

    As the volume of publications rapidly increases, searching for relevant information from the literature becomes more challenging. To complement standard search engines such as PubMed, it is desirable to have an advanced search tool that directly returns relevant biomedical entities such as targets, drugs, and mutations rather than a long list of articles. Some existing tools submit a query to PubMed and process retrieved abstracts to extract information at query time, resulting in a slow response time and limited coverage of only a fraction of the PubMed corpus. Other tools preprocess the PubMed corpus to speed up the response time; however, they are not constantly updated, and thus produce outdated results. Further, most existing tools cannot process sophisticated queries such as searches for mutations that co-occur with query terms in the literature. To address these problems, we introduce BEST, a biomedical entity search tool. BEST returns, as a result, a list of 10 different types of biomedical entities including genes, diseases, drugs, targets, transcription factors, miRNAs, and mutations that are relevant to a user's query. To the best of our knowledge, BEST is the only system that processes free text queries and returns up-to-date results in real time including mutation information in the results. BEST is freely accessible at http://best.korea.ac.kr.

  7. BEST: Next-Generation Biomedical Entity Search Tool for Knowledge Discovery from Biomedical Literature

    PubMed Central

    Lee, Kyubum; Choi, Jaehoon; Kim, Seongsoon; Jeon, Minji; Lim, Sangrak; Choi, Donghee; Kim, Sunkyu; Tan, Aik-Choon

    2016-01-01

    As the volume of publications rapidly increases, searching for relevant information from the literature becomes more challenging. To complement standard search engines such as PubMed, it is desirable to have an advanced search tool that directly returns relevant biomedical entities such as targets, drugs, and mutations rather than a long list of articles. Some existing tools submit a query to PubMed and process retrieved abstracts to extract information at query time, resulting in a slow response time and limited coverage of only a fraction of the PubMed corpus. Other tools preprocess the PubMed corpus to speed up the response time; however, they are not constantly updated, and thus produce outdated results. Further, most existing tools cannot process sophisticated queries such as searches for mutations that co-occur with query terms in the literature. To address these problems, we introduce BEST, a biomedical entity search tool. BEST returns, as a result, a list of 10 different types of biomedical entities including genes, diseases, drugs, targets, transcription factors, miRNAs, and mutations that are relevant to a user’s query. To the best of our knowledge, BEST is the only system that processes free text queries and returns up-to-date results in real time including mutation information in the results. BEST is freely accessible at http://best.korea.ac.kr. PMID:27760149

  8. Search terms and a validated brief search filter to retrieve publications on health-related values in Medline: a word frequency analysis study.

    PubMed

    Petrova, Mila; Sutcliffe, Paul; Fulford, K W M Bill; Dale, Jeremy

    2012-01-01

    Healthcare debates and policy developments are increasingly concerned with a broad range of values-related areas. These include not only ethical, moral, religious, and other types of values 'proper', but also beliefs, preferences, experiences, choices, satisfaction, quality of life, etc. Research on such issues may be difficult to retrieve. This study used word frequency analysis to generate a broad pool of search terms and a brief filter to facilitate relevant searches in bibliographic databases. Word frequency analysis for 'values terms' was performed on citations on diabetes, obesity, dementia, and schizophrenia (Medline; 2004-2006; 4440 citations; 1,110,291 words). Concordance® and SPSS 14.0 were used. Text words and MeSH terms of high frequency and precision were compiled into a search filter. It was validated on datasets of citations on dentistry and food hypersensitivity. 144 unique text words and 124 unique MeSH terms of moderate and high frequency (≥ 20) and very high precision (≥ 90%) were identified. Of these, 19 text words and seven MeSH terms were compiled into a 'brief values filter'. In the derivation dataset, it had a sensitivity of 76.8% and precision of 86.8%. In the validation datasets, its sensitivity and precision were, respectively, 70.1% and 63.6% (food hypersensitivity) and 47.1% and 82.6% (dentistry). This study provided a varied pool of search terms and a simple and highly effective tool for retrieving publications on health-related values. Further work is required to facilitate access to such research and enhance its chances of being translated into practice, policy, and service improvements.

  9. Representation and alignment of sung queries for music information retrieval

    NASA Astrophysics Data System (ADS)

    Adams, Norman H.; Wakefield, Gregory H.

    2005-09-01

    The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.

  10. Demystifying the Search Button

    PubMed Central

    McKeever, Liam; Nguyen, Van; Peterson, Sarah J.; Gomez-Perez, Sandra

    2015-01-01

    A thorough review of the literature is the basis of all research and evidence-based practice. A gold-standard efficient and exhaustive search strategy is needed to ensure all relevant citations have been captured and that the search performed is reproducible. The PubMed database comprises both the MEDLINE and non-MEDLINE databases. MEDLINE-based search strategies are robust but capture only 89% of the total available citations in PubMed. The remaining 11% include the most recent and possibly relevant citations but are only searchable through less efficient techniques. An effective search strategy must employ both the MEDLINE and the non-MEDLINE portion of PubMed to ensure all studies have been identified. The robust MEDLINE search strategies are used for the MEDLINE portion of the search. Usage of the less robust strategies is then efficiently confined to search only the remaining 11% of PubMed citations that have not been indexed for MEDLINE. The current article offers step-by-step instructions for building such a search exploring methods for the discovery of medical subject heading (MeSH) terms to search MEDLINE, text-based methods for exploring the non-MEDLINE database, information on the limitations of convenience algorithms such as the “related citations feature,” the strengths and pitfalls associated with commonly used filters, the proper usage of Boolean operators to organize a master search strategy, and instructions for automating that search through “MyNCBI” to receive search query updates by email as new citations become available. PMID:26129895

  11. PSS-SQL: protein secondary structure - structured query language.

    PubMed

    Mrozek, Dariusz; Wieczorek, Dominika; Malysiak-Mrozek, Bozena; Kozielski, Stanislaw

    2010-01-01

    Secondary structure representation of proteins provides important information regarding protein general construction and shape. This representation is often used in protein similarity searching. Since existing commercial database management systems do not offer integrated exploration methods for biological data e.g. at the level of the SQL language, the structural similarity searching is usually performed by external tools. In the paper, we present our newly developed PSS-SQL language, which allows searching a database in order to identify proteins having secondary structure similar to the structure specified by the user in a PSS-SQL query. Therefore, we provide a simple and declarative language for protein structure similarity searching.

  12. Decomposition: A Strategy for Query Processing.

    ERIC Educational Resources Information Center

    Wong, Eugene; Youssefi, Karel

    Multivariable queries can be processed in the data base management system INGRES. The general procedure is to decompose the query into a sequence of one-variable queries using two processes. One process is reduction which requires breaking off components of the query which are joined to it by a single variable. The other process,…

  13. Mining Longitudinal Web Queries: Trends and Patterns.

    ERIC Educational Resources Information Center

    Wang, Peiling; Berry, Michael W.; Yang, Yiheng

    2003-01-01

    Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…

  14. Uncertain spatial data handling: Modeling, indexing and query

    NASA Astrophysics Data System (ADS)

    Li, Rui; Bhanu, Bir; Ravishankar, Chinya; Kurth, Michael; Ni, Jinfeng

    2007-01-01

    Managing and manipulating uncertainty in spatial databases are important problems for various practical applications of geographic information systems. Unlike the traditional fuzzy approaches in relational databases, in this paper a probability-based method to model and index uncertain spatial data is proposed. In this scheme, each object is represented by a probability density function (PDF) and a general measure is proposed for measuring similarity between the objects. To index objects, an optimized Gaussian mixture hierarchy (OGMH) is designed to support both certain/uncertain data and certain/uncertain queries. An uncertain R-tree is designed with two query filtering schemes, UR1 and UR2, for the special case when the query is certain. By performing a comprehensive comparison among OGMH, UR1, UR2 and a standard R-tree on US Census Bureau TIGER/Line ® Southern California landmark point dataset, it is found that UR1 is the best for certain queries. As an example of uncertain query support OGMH is applied to the Mojave Desert endangered species protection real dataset. It is found that OGMH provides more selective, efficient and flexible search than the results provided by the existing trial and error approach for endangered species habitat search. Details of the experiments are given and discussed.

  15. A Distributed Query Processing Engine

    NASA Astrophysics Data System (ADS)

    Chatterjea, S.; Havinga, P.

    2004-04-01

    Wireless sensor networks (WSNs) are formed of tiny, highly energy-constrained sensor nodes that are equipped with wireless transceivers. They may be mobile and are usually deployed in large numbers in unfamiliar environments. The nodes communicate with one another by autonomously creating ad-hoc networks which are subsequently used to gather sensor data. WSNs also process the data within the network itself and only forward the result to the requesting node. This is referred to as in-network data aggregation and results in the substantial reduction of the amount of data that needs to be transmitted by any single node in the network. In this paper we present a framework for a distributed query processing engine (DQPE) which would allow sensor nodes to examine incoming queries and autonomously perform query optimisation using information available locally. Such qualities make a WSN the perfect tool to carryout environmental monitoring in future planetary exploration missions in a reliable and cost effective manner.

  16. VISAGE: A Query Interface for Clinical Research.

    PubMed

    Zhang, Guo-Qiang; Siegler, Trish; Saxman, Paul; Sandberg, Neil; Mueller, Remo; Johnson, Nathan; Hunscher, Dale; Arabandi, Sivaram

    2010-03-01

    We present the design and implementation of VISAGE (VISual AGgregator and Explorer), a query interface for clinical research. We follow a user-centered development approach and incorporate visual, ontological, searchable and explorative features in three interrelated components: Query Builder, Query Manager and Query Explorer. The Query Explorer provides novel on-line data mining capabilities for purposes such as hypothesis generation or cohort identification. The VISAGE query interface has been implemented as a significant component of Physio-MIMI, an NCRR-funded, multi-CTSA-site pilot project. Preliminary evaluation results show that VISAGE is more efficient for query construction than the i2b2 web-client.

  17. Approximate Nearest Neighbor Search by Residual Vector Quantization

    PubMed Central

    Chen, Yongjian; Guan, Tao; Wang, Cheng

    2010-01-01

    A recently proposed product quantization method is efficient for large scale approximate nearest neighbor search, however, its performance on unstructured vectors is limited. This paper introduces residual vector quantization based approaches that are appropriate for unstructured vectors. Database vectors are quantized by residual vector quantizer. The reproductions are represented by short codes composed of their quantization indices. Euclidean distance between query vector and database vector is approximated by asymmetric distance, i.e., the distance between the query vector and the reproduction of the database vector. An efficient exhaustive search approach is proposed by fast computing the asymmetric distance. A straight forward non-exhaustive search approach is proposed for large scale search. Our approaches are compared to two state-of-the-art methods, spectral hashing and product quantization, on both structured and unstructured datasets. Results show that our approaches obtain the best results in terms of the trade-off between search quality and memory usage. PMID:22163524

  18. Consistent Query Answering of Conjunctive Queries under Primary Key Constraints

    ERIC Educational Resources Information Center

    Pema, Enela

    2014-01-01

    An inconsistent database is a database that violates one or more of its integrity constraints. In reality, violations of integrity constraints arise frequently under several different circumstances. Inconsistent databases have long posed the challenge to develop suitable tools for meaningful query answering. A principled approach for querying…

  19. Search terms and a validated brief search filter to retrieve publications on health-related values in Medline: a word frequency analysis study

    PubMed Central

    Sutcliffe, Paul; Fulford, K W M (Bill); Dale, Jeremy

    2011-01-01

    Objective Healthcare debates and policy developments are increasingly concerned with a broad range of values-related areas. These include not only ethical, moral, religious, and other types of values ‘proper’, but also beliefs, preferences, experiences, choices, satisfaction, quality of life, etc. Research on such issues may be difficult to retrieve. This study used word frequency analysis to generate a broad pool of search terms and a brief filter to facilitate relevant searches in bibliographic databases. Methods Word frequency analysis for ‘values terms’ was performed on citations on diabetes, obesity, dementia, and schizophrenia (Medline; 2004–2006; 4440 citations; 1 110 291 words). Concordance® and SPSS 14.0 were used. Text words and MeSH terms of high frequency and precision were compiled into a search filter. It was validated on datasets of citations on dentistry and food hypersensitivity. Results 144 unique text words and 124 unique MeSH terms of moderate and high frequency (≥20) and very high precision (≥90%) were identified. Of these, 19 text words and seven MeSH terms were compiled into a ‘brief values filter’. In the derivation dataset, it had a sensitivity of 76.8% and precision of 86.8%. In the validation datasets, its sensitivity and precision were, respectively, 70.1% and 63.6% (food hypersensitivity) and 47.1% and 82.6% (dentistry). Conclusions This study provided a varied pool of search terms and a simple and highly effective tool for retrieving publications on health-related values. Further work is required to facilitate access to such research and enhance its chances of being translated into practice, policy, and service improvements. PMID:21846778

  20. Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.

    PubMed

    Khennak, Ilyes; Drias, Habiba

    2017-02-01

    With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.

  1. Improving image retrieval effectiveness via query expansion using MeSH hierarchical structure

    PubMed Central

    Crespo Azcárate, Mariano; Mata Vázquez, Jacinto; Maña López, Manuel

    2013-01-01

    Objective We explored two strategies for query expansion utilizing medical subject headings (MeSH) ontology to improve the effectiveness of medical image retrieval systems. In order to achieve greater effectiveness in the expansion, the search text was analyzed to identify which terms were most amenable to being expanded. Design To perform the expansions we utilized the hierarchical structure by which the MeSH descriptors are organized. Two strategies for selecting the terms to be expanded in each query were studied. The first consisted of identifying the medical concepts using the unified medical language system metathesaurus. In the second strategy the text of the query was divided into n-grams, resulting in sequences corresponding to MeSH descriptors. Measurements For the evaluation of the system, we used the collection made available by the ImageCLEF organization in its 2011 medical image retrieval task. The main measure of efficiency employed for evaluating the techniques developed was the mean average precision (MAP). Results Both strategies exceeded the average MAP score in the ImageCLEF 2011 competition (0.1644). The n-gram expansion strategy achieved a MAP of 0.2004, which represents an improvement of 21.89% over the average MAP score in the competition. On the other hand, the medical concepts expansion strategy scored 0.2172 in the MAP, representing a 32.11% improvement. This run won the text-based medical image retrieval task in 2011. Conclusions Query expansion exploiting the hierarchical structure of the MeSH descriptors achieved a significant improvement in image retrieval systems. PMID:22952301

  2. Improving image retrieval effectiveness via query expansion using MeSH hierarchical structure.

    PubMed

    Crespo Azcárate, Mariano; Mata Vázquez, Jacinto; Maña López, Manuel

    2013-01-01

    We explored two strategies for query expansion utilizing medical subject headings (MeSH) ontology to improve the effectiveness of medical image retrieval systems. In order to achieve greater effectiveness in the expansion, the search text was analyzed to identify which terms were most amenable to being expanded. To perform the expansions we utilized the hierarchical structure by which the MeSH descriptors are organized. Two strategies for selecting the terms to be expanded in each query were studied. The first consisted of identifying the medical concepts using the unified medical language system metathesaurus. In the second strategy the text of the query was divided into n-grams, resulting in sequences corresponding to MeSH descriptors. For the evaluation of the system, we used the collection made available by the ImageCLEF organization in its 2011 medical image retrieval task. The main measure of efficiency employed for evaluating the techniques developed was the mean average precision (MAP). Both strategies exceeded the average MAP score in the ImageCLEF 2011 competition (0.1644). The n-gram expansion strategy achieved a MAP of 0.2004, which represents an improvement of 21.89% over the average MAP score in the competition. On the other hand, the medical concepts expansion strategy scored 0.2172 in the MAP, representing a 32.11% improvement. This run won the text-based medical image retrieval task in 2011. Query expansion exploiting the hierarchical structure of the MeSH descriptors achieved a significant improvement in image retrieval systems.

  3. The Database Query Support Processor (QSP)

    NASA Technical Reports Server (NTRS)

    1993-01-01

    The number and diversity of databases available to users continues to increase dramatically. Currently, the trend is towards decentralized, client server architectures that (on the surface) are less expensive to acquire, operate, and maintain than information architectures based on centralized, monolithic mainframes. The database query support processor (QSP) effort evaluates the performance of a network level, heterogeneous database access capability. Air Force Material Command's Rome Laboratory has developed an approach, based on ANSI standard X3.138 - 1988, 'The Information Resource Dictionary System (IRDS)' to seamless access to heterogeneous databases based on extensions to data dictionary technology. To successfully query a decentralized information system, users must know what data are available from which source, or have the knowledge and system privileges necessary to find out this information. Privacy and security considerations prohibit free and open access to every information system in every network. Even in completely open systems, time required to locate relevant data (in systems of any appreciable size) would be better spent analyzing the data, assuming the original question was not forgotten. Extensions to data dictionary technology have the potential to more fully automate the search and retrieval for relevant data in a decentralized environment. Substantial amounts of time and money could be saved by not having to teach users what data resides in which systems and how to access each of those systems. Information describing data and how to get it could be removed from the application and placed in a dedicated repository where it belongs. The result simplified applications that are less brittle and less expensive to build and maintain. Software technology providing the required functionality is off the shelf. The key difficulty is in defining the metadata required to support the process. The database query support processor effort will provide

  4. Building and Querying RDF/OWL Database of Semantically Annotated Nuclear Medicine Images.

    PubMed

    Hwang, Kyung Hoon; Lee, Haejun; Koh, Geon; Willrett, Debra; Rubin, Daniel L

    2017-02-01

    As the use of positron emission tomography-computed tomography (PET-CT) has increased rapidly, there is a need to retrieve relevant medical images that can assist image interpretation. However, the images themselves lack the explicit information needed for query. We constructed a semantically structured database of nuclear medicine images using the Annotation and Image Markup (AIM) format and evaluated the ability the AIM annotations to improve image search. We created AIM annotation templates specific to the nuclear medicine domain and used them to annotate 100 nuclear medicine PET-CT studies in AIM format using controlled vocabulary. We evaluated image retrieval from 20 specific clinical queries. As the gold standard, two nuclear medicine physicians manually retrieved the relevant images from the image database using free text search of radiology reports for the same queries. We compared query results with the manually retrieved results obtained by the physicians. The query performance indicated a 98 % recall for simple queries and a 89 % recall for complex queries. In total, the queries provided 95 % (75 of 79 images) recall, 100 % precision, and an F1 score of 0.97 for the 20 clinical queries. Three of the four images missed by the queries required reasoning for successful retrieval. Nuclear medicine images augmented using semantic annotations in AIM enabled high recall and precision for simple queries, helping physicians to retrieve the relevant images. Further study using a larger data set and the implementation of an inference engine may improve query results for more complex queries.

  5. Querying Large Biological Network Datasets

    ERIC Educational Resources Information Center

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  6. Querying the public databases for sequences using complex keywords contained in the feature lines

    PubMed Central

    Croce, Olivier; Lamarre, Michaël; Christen, Richard

    2006-01-01

    Background High throughput technologies often require the retrieval of large data sets of sequences. Retrieval of EMBL or GenBank entries using keywords is easy using tools such as ACNUC, Entrez or SRS, but has some limitations, in particular when querying with complex keywords. Results We show that Entrez has severe limitations with respect to retrieving subsequences. SRS works well with simple keywords but not with keywords composed of several terms, and has problems with complex queries. ACNUC works well, but does not allow precise queries in the Feature qualifiers. We developed specific Perl scripts to precisely retrieve subsequences as defined by complex descriptors in the Features qualifiers of the EMBL entries. We improved parts of the bioPerl library to allow parsing of large data files, and we embedded these scripts in a user friendly interface (OS independent) for easy use. Conclusion Although not as fast as the public tools that use prebuilt indexes, parsing the complete entries using a script is often necessary in order to retrieve the exact data searched for. Embedding in a user friendly interface allows biologists to use the scripts, which can easily be modified, if necessary, by bioinformaticians for unforeseen needs. PMID:16441875

  7. Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.

    PubMed

    De-Arteaga, Maria; Eggel, Ivan; Do, Bao; Rubin, Daniel; Kahn, Charles E; Müller, Henning

    2015-08-01

    Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the way in which physicians aim to access information. Medical image search is a much smaller domain but has gained much attention as it has different characteristics than search for text documents. While web search log files have been analysed many times to better understand user behaviour, the log files of hospital internal systems for search in a PACS/RIS (Picture Archival and Communication System, Radiology Information System) have rarely been analysed. Such a comparison between a hospital PACS/RIS search and a web system for searching images of the biomedical literature is the goal of this paper. Objectives are to identify similarities and differences in search behaviour of the two systems, which could then be used to optimize existing systems and build new search engines. Log files of the ARRS GoldMiner medical image search engine (freely accessible on the Internet) containing 222,005 queries, and log files of Stanford's internal PACS/RIS search called radTF containing 18,068 queries were analysed. Each query was preprocessed and all query terms were mapped to the RadLex (Radiology Lexicon) terminology, a comprehensive lexicon of radiology terms created and maintained by the Radiological Society of North America, so the semantic content in the queries and the links between terms could be analysed, and synonyms for the same concept could be detected. RadLex was mainly created for the use in radiology reports, to aid structured reporting and the preparation of educational material (Lanlotz, 2006) [1]. In standard medical vocabularies such as MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) specific terms of radiology are often

  8. Implementation of the common phrase index method on the phrase query for information retrieval

    NASA Astrophysics Data System (ADS)

    Fatmawati, Triyah; Zaman, Badrus; Werdiningsih, Indah

    2017-08-01

    As the development of technology, the process of finding information on the news text is easy, because the text of the news is not only distributed in print media, such as newspapers, but also in electronic media that can be accessed using the search engine. In the process of finding relevant documents on the search engine, a phrase often used as a query. The number of words that make up the phrase query and their position obviously affect the relevance of the document produced. As a result, the accuracy of the information obtained will be affected. Based on the outlined problem, the purpose of this research was to analyze the implementation of the common phrase index method on information retrieval. This research will be conducted in English news text and implemented on a prototype to determine the relevance level of the documents produced. The system is built with the stages of pre-processing, indexing, term weighting calculation, and cosine similarity calculation. Then the system will display the document search results in a sequence, based on the cosine similarity. Furthermore, system testing will be conducted using 100 documents and 20 queries. That result is then used for the evaluation stage. First, determine the relevant documents using kappa statistic calculation. Second, determine the system success rate using precision, recall, and F-measure calculation. In this research, the result of kappa statistic calculation was 0.71, so that the relevant documents are eligible for the system evaluation. Then the calculation of precision, recall, and F-measure produces precision of 0.37, recall of 0.50, and F-measure of 0.43. From this result can be said that the success rate of the system to produce relevant documents is low.

  9. Monitoring seasonal influenza epidemics by using internet search data with an ensemble penalized regression model

    PubMed Central

    Guo, Pi; Zhang, Jianjun; Wang, Li; Yang, Shaoyi; Luo, Ganfeng; Deng, Changyu; Wen, Ye; Zhang, Qingying

    2017-01-01

    Seasonal influenza epidemics cause serious public health problems in China. Search queries-based surveillance was recently proposed to complement traditional monitoring approaches of influenza epidemics. However, developing robust techniques of search query selection and enhancing predictability for influenza epidemics remains a challenge. This study aimed to develop a novel ensemble framework to improve penalized regression models for detecting influenza epidemics by using Baidu search engine query data from China. The ensemble framework applied a combination of bootstrap aggregating (bagging) and rank aggregation method to optimize penalized regression models. Different algorithms including lasso, ridge, elastic net and the algorithms in the proposed ensemble framework were compared by using Baidu search engine queries. Most of the selected search terms captured the peaks and troughs of the time series curves of influenza cases. The predictability of the conventional penalized regression models were improved by the proposed ensemble framework. The elastic net regression model outperformed the compared models, with the minimum prediction errors. We established a Baidu search engine queries-based surveillance model for monitoring influenza epidemics, and the proposed model provides a useful tool to support the public health response to influenza and other infectious diseases. PMID:28422149

  10. Persistent Identifiers for Improved Accessibility for Linked Data Querying

    NASA Astrophysics Data System (ADS)

    Shepherd, A.; Chandler, C. L.; Arko, R. A.; Fils, D.; Jones, M. B.; Krisnadhi, A.; Mecum, B.

    2016-12-01

    The adoption of linked open data principles within the geosciences has increased the amount of accessible information available on the Web. However, this data is difficult to consume for those who are unfamiliar with Semantic Web technologies such as Web Ontology Language (OWL), Resource Description Framework (RDF) and SPARQL - the RDF query language. Consumers would need to understand the structure of the data and how to efficiently query it. Furthermore, understanding how to query doesn't solve problems of poor precision and recall in search results. For consumers unfamiliar with the data, full-text searches are most accessible, but not ideal as they arrest the advantages of data disambiguation and co-reference resolution efforts. Conversely, URI searches across linked data can deliver improved search results, but knowledge of these exact URIs may remain difficult to obtain. The increased adoption of Persistent Identifiers (PIDs) can lead to improved linked data querying by a wide variety of consumers. Because PIDs resolve to a single entity, they are an excellent data point for disambiguating content. At the same time, PIDs are more accessible and prominent than a single data provider's linked data URI. When present in linked open datasets, PIDs provide balance between the technical and social hurdles of linked data querying as evidenced by the NSF EarthCube GeoLink project. The GeoLink project, funded by NSF's EarthCube initiative, have brought together data repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span scientific studies from marine geology to marine ecosystems and biogeochemistry to paleoclimatology.

  11. Framing memories: How the retrieval query format shapes the neural bases of remembering.

    PubMed

    Raposo, Ana; Frade, Sofia; Alves, Mara

    2016-08-01

    The way memory questions are framed influences the information that is searched, retrieved, and monitored during remembering. This fMRI study aimed at clarifying how the format of the retrieval query shapes the neural basis of source recollection. During encoding, participants made semantic (pleasantness) or perceptual (number of letters) judgments about words. Subsequently, in a source memory test, the retrieval query was manipulated such that for half of the items from each encoding task, the retrieval query emphasized the semantic source (i.e., semantic query format: "Is this word from the pleasantness task?"), whereas for the other half the retrieval query emphasized the alternate, perceptual source (i.e., perceptual query format: "Is this word from the letter task?"). The results showed that the semantic query format was associated with higher source recognition than the perceptual query format. This behavioral advantage was accompanied by increased activation in several regions associated to controlled semantic elaboration and monitoring of internally-generated features about the past event. In particular, for items semantically encoded, the semantic query, relative to the perceptual query, induced activation in medial prefrontal cortex (PFC), hippocampal, parahippocampal and middle temporal cortex. Conversely, for items perceptually encoded, the semantic query recruited the lateral PFC and occipital-fusiform areas. Interestingly, the semantic format also influenced the processing of new items, eliciting greater L lateral and medial PFC activation. In contrast, the perceptual query format (versus the semantic format) only prompted greater activation in R orbitofrontal cortex and the R inferior parietal lobe, for items encoded in a perceptual manner and for new items, respectively. The results highlight the role of the retrieval query format in source remembering, showing that the retrieval query that emphasizes the semantic source promotes the use of semantic

  12. Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal

    PubMed Central

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen

    2014-01-01

    Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for

  13. Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal.

    PubMed

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman

    2014-07-04

    The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for

  14. Google search behavior for status epilepticus.

    PubMed

    Brigo, Francesco; Trinka, Eugen

    2015-08-01

    Millions of people surf the Internet every day as a source of health-care information looking for materials about symptoms, diagnosis, treatments and their possible adverse effects, or diagnostic procedures. Google is the most popular search engine and is used by patients and physicians to search for online health-related information. This study aimed to evaluate changes in Google search behavior occurring in English-speaking countries over time for the term "status epilepticus" (SE). Using Google Trends, data on global search queries for the term SE between the 1st of January 2004 and 31st of December 2014 were analyzed. Search volume numbers over time (downloaded as CSV datasets) were analyzed by applying the "health" category filter. The research trends for the term SE remained fairly constant over time. The greatest search volume for the term SE was reported in the United States, followed by India, Australia, the United Kingdom, Canada, the Netherlands, Thailand, and Germany. Most terms associated with the search queries were related to SE definition, symptoms, subtypes, and treatment. The volume of searches for some queries (nonconvulsive, focal, and refractory SE; SE definition; SE guidelines; SE symptoms; SE management; SE treatment) was enormously increased over time (search popularity has exceeded a 5000% growth since 2004). Most people use search engines to look for the term SE to obtain information on its definition, subtypes, and management. The greatest search volume occurred not only in developed countries but also in developing countries where raising awareness about SE still remains a challenging task and where there is reduced public knowledge of epilepsy. Health information seeking (the extent to which people search for health information online) reflects the health-related information needs of Internet users for a specific disease. Google Trends shows that Internet users have a great demand for information concerning some aspects of SE

  15. Crowd-sourced Ontology for Photoleukocoria: Identifying Common Internet Search Terms for a Potentially Important Pediatric Ophthalmic Sign.

    PubMed

    Staffieri, Sandra E; Kearns, Lisa S; Sanfilippo, Paul G; Craig, Jamie E; Mackey, David A; Hewitt, Alex W

    2018-02-01

    Leukocoria is the most common presenting sign for pediatric eye disease including retinoblastoma and cataract, with worse outcomes if diagnosis is delayed. We investigated whether individuals could identify leukocoria in photographs (photoleukocoria) and examined their subsequent Internet search behavior. Using a web-based questionnaire, in this cross-sectional study we invited adults aged over 18 years to view two photographs of a child with photoleukocoria, and then search the Internet to determine a possible diagnosis and action plan. The most commonly used search terms and websites accessed were recorded. The questionnaire was completed by 1639 individuals. Facebook advertisement was the most effective recruitment strategy. The mean age of all respondents was 38.95 ± 14.59 years (range, 18-83), 94% were female, and 59.3% had children. An abnormality in the images presented was identified by 1613 (98.4%) participants. The most commonly used search terms were: "white," "pupil," "photo," and "eye" reaching a variety of appropriate websites or links to print or social media articles. Different words or phrases were used to describe the same observation of photoleukocoria leading to a range of websites. Variations in the description of observed signs and search words influenced the sites reached, information obtained, and subsequent help-seeking intentions. Identifying the most commonly used search terms for photoleukocoria is an important step for search engine optimization. Being directed to the most appropriate websites informing of the significance of photoleukocoria and the appropriate actions to take could improve delays in diagnosis of important pediatric eye disease such as retinoblastoma or cataract.

  16. Characterization of the biomedical query mediation process.

    PubMed

    Hruby, Gregory W; Boland, Mary Regina; Cimino, James J; Gao, Junfeng; Wilcox, Adam B; Hirschberg, Julia; Weng, Chunhua

    2013-01-01

    To most medical researchers, databases are obscure black boxes. Query analysts are often indispensable guides aiding researchers to perform mediated data queries. However, this approach does not scale up and is time-consuming and expensive. We analyzed query mediation dialogues to inform future designs of intelligent query mediation systems. Thirty-one mediated query sessions for 22 research projects were recorded and transcribed. We analyzed 10 of these to develop an annotation schema for dialogue acts through iterative refinement. Three coders independently annotated all 3160 dialogue acts. We assessed the inter-rater agreement and resolved disagreement by group consensus. This study contributes early knowledge of the query negotiation space for medical research. We conclude that research data query formulation is not a straightforward translation from researcher data needs to database queries, but rather iterative, process-oriented needs assessment and refinement.

  17. Using Web-Based Search Data to Study the Public’s Reactions to Societal Events: The Case of the Sandy Hook Shooting

    PubMed Central

    2017-01-01

    Background Internet search is the most common activity on the World Wide Web and generates a vast amount of user-reported data regarding their information-seeking preferences and behavior. Although this data has been successfully used to examine outbreaks, health care utilization, and outcomes related to quality of care, its value in informing public health policy remains unclear. Objective The aim of this study was to evaluate the role of Internet search query data in health policy development. To do so, we studied the public’s reaction to a major societal event in the context of the 2012 Sandy Hook School shooting incident. Methods Query data from the Yahoo! search engine regarding firearm-related searches was analyzed to examine changes in user-selected search terms and subsequent websites visited for a period of 14 days before and after the shooting incident. Results A total of 5,653,588 firearm-related search queries were analyzed. In the after period, queries increased for search terms related to “guns” (+50.06%), “shooting incident” (+333.71%), “ammunition” (+155.14%), and “gun-related laws” (+535.47%). The highest increase (+1054.37%) in Web traffic was seen by news websites following “shooting incident” queries whereas searches for “guns” (+61.02%) and “ammunition” (+173.15%) resulted in notable increases in visits to retail websites. Firearm-related queries generally returned to baseline levels after approximately 10 days. Conclusions Search engine queries present a viable infodemiology metric on public reactions and subsequent behaviors to major societal events and could be used by policymakers to inform policy development. PMID:28336508

  18. Using Web-Based Search Data to Study the Public's Reactions to Societal Events: The Case of the Sandy Hook Shooting.

    PubMed

    Menachemi, Nir; Rahurkar, Saurabh; Rahurkar, Mandar

    2017-03-23

    Internet search is the most common activity on the World Wide Web and generates a vast amount of user-reported data regarding their information-seeking preferences and behavior. Although this data has been successfully used to examine outbreaks, health care utilization, and outcomes related to quality of care, its value in informing public health policy remains unclear. The aim of this study was to evaluate the role of Internet search query data in health policy development. To do so, we studied the public's reaction to a major societal event in the context of the 2012 Sandy Hook School shooting incident. Query data from the Yahoo! search engine regarding firearm-related searches was analyzed to examine changes in user-selected search terms and subsequent websites visited for a period of 14 days before and after the shooting incident. A total of 5,653,588 firearm-related search queries were analyzed. In the after period, queries increased for search terms related to "guns" (+50.06%), "shooting incident" (+333.71%), "ammunition" (+155.14%), and "gun-related laws" (+535.47%). The highest increase (+1054.37%) in Web traffic was seen by news websites following "shooting incident" queries whereas searches for "guns" (+61.02%) and "ammunition" (+173.15%) resulted in notable increases in visits to retail websites. Firearm-related queries generally returned to baseline levels after approximately 10 days. Search engine queries present a viable infodemiology metric on public reactions and subsequent behaviors to major societal events and could be used by policymakers to inform policy development.

  19. Ad-Hoc Queries over Document Collections - A Case Study

    NASA Astrophysics Data System (ADS)

    Löser, Alexander; Lutter, Steffen; Düssel, Patrick; Markl, Volker

    We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.

  20. Short-term absence from industry: III The inference of `proneness' and a search for causes

    PubMed Central

    Froggatt, P.

    1970-01-01

    Froggatt, P. (1970).Brit. J. industr. Med.,27, 297-312. Short-term absence from industry. III. The inference of `proneness' and a search for causes. The abilities of five hypotheses (`chance', `proneness', and three of `true contagion' - as defined in the text) to explain the distributions of one-day and two-day absences among groups of male and female industrial personnel and clerks in government service are examined by curve-fitting and correlation methods. The five hypotheses generate (in order) the Poisson, negative binomial, Neyman type A, Short, and Hermite (two-parameter form) distributions which are fitted to the data using maximum-likelihood estimates. The conclusion is drawn that `proneness', i.e., a stable `liability', compounded from several though unquantifiable factors, and constant for each individual over the period of the study, is markedly successful in explaining the data. It is emphasized that some of the other hypotheses under test cannot be unequivocably rejected; and there is in theory an infinite number, still unformulated or untested, which may be acceptable or even fit the data better. Correlation coefficients for the numbers of one-day (and two-day) absences taken by the same individuals in two equal non-overlapping periods of time are of the order 0·5 to 0·7 (0·3 to 0·5 for two-day absences) and the corresponding regressions fulfil linear requirements. These correlations are higher than any between `personal characteristics' and their overt consequence in contingent fields of human enquiry. For one-day absences the predictive power for the future from the past record could in some circumstances justify executive action. When freely available, overtime was greatest among junior married men and least among junior married women. The validity of the inference of `proneness' and the implications of its acceptance are fully discussed. While interpretation is not unequivocal, one-day absences seemingly have many causes; two-day absences are

  1. Scale-Independent Relational Query Processing

    DTIC Science & Technology

    2013-10-04

    remainder of this process for brevity. It should be noted that for queries containing self -joins, all delta queries for the modified relation must be...Scale-Independent Relational Query Processing Michael Armbrust Electrical Engineering and Computer Sciences University of California at Berkeley...DATES COVERED 00-00-2013 to 00-00-2013 4. TITLE AND SUBTITLE Scale-Independent Relational Query Processing 5a. CONTRACT NUMBER 5b. GRANT NUMBER

  2. EarthServer: Information Retrieval and Query Language

    NASA Astrophysics Data System (ADS)

    Perperis, Thanassis; Koltsida, Panagiota; Kakaletris, George

    2013-04-01

    new construct allowing "mixed search" on both OGC coverages and XML-represented metadata and also returning "mixed results" further enabling seamless geospatial and array, combined data and metadata, processing under a familiar syntactic formalism. xWCPS is a superset of WCPS closely following XQuery's syntax and philosophy, further extending it with capabilities to handle coverages, array and multidimensional data, allowing different degrees of compliance to its results and opening new possibilities for data definition, processing and interoperability. Our long term vision for xWCPS is from the one hand to enable coverage and corresponding metadata retrieval, irrespective of their actual origin and form and from the other hand to offer syntactic constructs for data definition and data manipulation. Thus xWCPS queries could potentially employ distributed services to access diverse, cross-disciplinary and physically distributed data sources, data within them and metadata about them without directly specifying which coverages to employ, which parts come from metadata and which come from data processing (aggregates) functions. Respectively a Data Definition Language could allow schema definition and a Data Manipulation Language could enable updates, inserts, and deletes of data handled by an xWCPS system. xWCPS's specification is currently in draft form. We intend to initiate the corresponding OGC standardization activity with the finalization of the specification of the language.

  3. An assessment of the visibility of MeSH-indexed medical web catalogs through search engines.

    PubMed

    Zweigenbaum, P; Darmoni, S J; Grabar, N; Douyère, M; Benichou, J

    2002-01-01

    Manually indexed Internet health catalogs such as CliniWeb or CISMeF provide resources for retrieving high-quality health information. Users of these quality-controlled subject gateways are most often referred to them by general search engines such as Google, AltaVista, etc. This raises several questions, among which the following: what is the relative visibility of medical Internet catalogs through search engines? This study addresses this issue by measuring and comparing the visibility of six major, MeSH-indexed health catalogs through four different search engines (AltaVista, Google, Lycos, Northern Light) in two languages (English and French). Over half a million queries were sent to the search engines; for most of these search engines, according to our measures at the time the queries were sent, the most visible catalog for English MeSH terms was CliniWeb and the most visible one for French MeSH terms was CISMeF.

  4. A Priori Analysis of Natural Language Queries.

    ERIC Educational Resources Information Center

    Spiegler, Israel; Elata, Smadar

    1988-01-01

    Presents a model for the a priori analysis of natural language queries which uses an algorithm to transform the query into a logical pattern that is used to determine the answerability of the query. The results of testing by a prototype system implemented in PROLOG are discussed. (20 references) (CLB)

  5. Monitoring moving queries inside a safe region.

    PubMed

    Al-Khalidi, Haidar; Taniar, David; Betts, John; Alamri, Sultan

    2014-01-01

    With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns.

  6. Monitoring Moving Queries inside a Safe Region

    PubMed Central

    Al-Khalidi, Haidar; Taniar, David; Alamri, Sultan

    2014-01-01

    With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns. PMID:24696652

  7. Development and validation of queries using structured query language (SQL) to determine the utilization of comparison imaging in radiology reports stored on PACS.

    PubMed

    Lakhani, Paras; Menschik, Elliot D; Goldszal, Alberto F; Murray, Joseph P; Weiner, Mark G; Langlotz, Curtis P

    2006-03-01

    The purpose of this research was to develop queries that quantify the utilization of comparison imaging in free-text radiology reports. The queries searched for common phrases that indicate whether comparison imaging was utilized, not available, or not mentioned. The queries were iteratively refined and tested on random samples of 100 reports with human review as a reference standard until the precision and recall of the queries did not improve significantly between iterations. Then, query accuracy was assessed on a new random sample of 200 reports. Overall accuracy of the queries was 95.6%. The queries were then applied to a database of 1.8 million reports. Comparisons were made to prior images in 38.69% of the reports (693,955/1,793,754), were unavailable in 18.79% (337,028/1,793,754), and were not mentioned in 42.52% (762,771/1,793,754). The results show that queries of text reports can achieve greater than 95% accuracy in determining the utilization of prior images.

  8. Location hashing: an efficient indexing method for locating object queries in image databases

    NASA Astrophysics Data System (ADS)

    Syeda-Mahmood, Tanveer F.

    1998-12-01

    Queries referring to content embedded within images are an essential component of content-based search, browse, or summarize operations in image databases. Localization of such queries under changes in appearance, occlusions and background clutter, is a difficult problem, for which current spatial access structures in databases are not suitable. In this paper, we present a new method of indexing image databases, called location hashing, that uses a special data structure, called the location hash tree, for organizing feature information from images of a database. Location hashing is based on the principle of geometric hashing. It simultaneously determines the relevant images in the database, and the regions within them, which are most likely to contain 2D pattern query, without incurring a detailed search of either. The location hash tree being a red-black tree, allows for efficient search for candidate locations using pose-invariant feature information derived from the query.

  9. Internet search volumes in brain aneurysms and subarachnoid hemorrhage: Is there evidence of seasonality?

    PubMed

    Ku, Jerry C; Alotaibi, Naif M; Wang, Justin; Ibrahim, George M; Schweizer, Tom A; Macdonald, R Loch

    2017-07-01

    Results of previous studies examining seasonal variation in the incidence of aneurysmal subarachnoid hemorrhage (SAH) are conflicting. The aim of this brief report is to investigate whether there is a seasonal effect in online search queries for SAH that may reflect an association between meteorological factors and aneurysm rupture. We used the Google Trends data service to analyze the volume of internet queries for SAH on Google's search engine from January 1, 2004 to November 2016. We used comprehensive search terms and collected data from: USA, Canada, and countries known for their high prevalence of SAH (Finland, and Japan), as well as worldwide search volume. Potential seasonal variations in the data were assessed by comparative non-parametric tests and curve-fit regression model. Our analyses revealed that USA had the highest median value in cumulative search scores (115 vs. 86, 46, 46 for Finland, Canada and Japan, respectively). The term "brain aneurysm" was the commonly used search term among countries, followed by "cerebral aneurysm". There was no evidence of seasonality in any of the countries studied on both univariate tests and regression time-adjusted analysis. There are no seasonal variations in internet search query volume for SAH. Further studies are needed to explore whether online search volumes correlate with the actual incidence of SAH. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Fast batch searching for protein homology based on compression and clustering.

    PubMed

    Ge, Hongwei; Sun, Liang; Yu, Jinghong

    2017-11-21

    In bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn't exploit the common subsequences shared by queries. We propose a compression and cluster based BLASTP (C2-BLASTP) algorithm to further exploit the joint information among the query sequences and the database. Firstly, the queries and database are compressed in turn by procedures of redundancy analysis, redundancy removal and distinction record. Secondly, the database is clustered according to Hamming distance among the subsequences. To improve the sensitivity and selectivity of sequence alignments, ten groups of reduced amino acid alphabets are used. Following this, the hits finding operator is implemented on the clustered database. Furthermore, an execution database is constructed based on the found potential hits, with the objective of mitigating the effect of increasing scale of the sequence database. Finally, the homology search is performed in the execution database. Experiments on NCBI NR database demonstrate the effectiveness of the proposed C2-BLASTP for batch searching of homology in sequence database. The results are evaluated in terms of homology accuracy, search speed and memory usage. It can be seen that the C2-BLASTP achieves competitive results as compared with some state-of-the-art methods.

  11. A Framework for WWW Query Processing

    NASA Technical Reports Server (NTRS)

    Wu, Binghui Helen; Wharton, Stephen (Technical Monitor)

    2000-01-01

    Query processing is the most common operation in a DBMS. Sophisticated query processing has been mainly targeted at a single enterprise environment providing centralized control over data and metadata. Submitting queries by anonymous users on the web is different in such a way that load balancing or DBMS' accessing control becomes the key issue. This paper provides a solution by introducing a framework for WWW query processing. The success of this framework lies in the utilization of query optimization techniques and the ontological approach. This methodology has proved to be cost effective at the NASA Goddard Space Flight Center Distributed Active Archive Center (GDAAC).

  12. Interactive ontology debugging: Two query strategies for efficient fault localization☆

    PubMed Central

    Shchekotykhin, Kostyantyn; Friedrich, Gerhard; Fleiss, Philipp; Rodler, Patrick

    2012-01-01

    Effective debugging of ontologies is an important prerequisite for their broad application, especially in areas that rely on everyday users to create and maintain knowledge bases, such as the Semantic Web. In such systems ontologies capture formalized vocabularies of terms shared by its users. However in many cases users have different local views of the domain, i.e. of the context in which a given term is used. Inappropriate usage of terms together with natural complications when formulating and understanding logical descriptions may result in faulty ontologies. Recent ontology debugging approaches use diagnosis methods to identify causes of the faults. In most debugging scenarios these methods return many alternative diagnoses, thus placing the burden of fault localization on the user. This paper demonstrates how the target diagnosis can be identified by performing a sequence of observations, that is, by querying an oracle about entailments of the target ontology. To identify the best query we propose two query selection strategies: a simple “split-in-half” strategy and an entropy-based strategy. The latter allows knowledge about typical user errors to be exploited to minimize the number of queries. Our evaluation showed that the entropy-based method significantly reduces the number of required queries compared to the “split-in-half” approach. We experimented with different probability distributions of user errors and different qualities of the a priori probabilities. Our measurements demonstrated the superiority of entropy-based query selection even in cases where all fault probabilities are equal, i.e. where no information about typical user errors is available. PMID:23543507

  13. Interactive ontology debugging: Two query strategies for efficient fault localization.

    PubMed

    Shchekotykhin, Kostyantyn; Friedrich, Gerhard; Fleiss, Philipp; Rodler, Patrick

    2012-04-01

    Effective debugging of ontologies is an important prerequisite for their broad application, especially in areas that rely on everyday users to create and maintain knowledge bases, such as the Semantic Web. In such systems ontologies capture formalized vocabularies of terms shared by its users. However in many cases users have different local views of the domain, i.e. of the context in which a given term is used. Inappropriate usage of terms together with natural complications when formulating and understanding logical descriptions may result in faulty ontologies. Recent ontology debugging approaches use diagnosis methods to identify causes of the faults. In most debugging scenarios these methods return many alternative diagnoses, thus placing the burden of fault localization on the user. This paper demonstrates how the target diagnosis can be identified by performing a sequence of observations, that is, by querying an oracle about entailments of the target ontology. To identify the best query we propose two query selection strategies: a simple "split-in-half" strategy and an entropy-based strategy. The latter allows knowledge about typical user errors to be exploited to minimize the number of queries. Our evaluation showed that the entropy-based method significantly reduces the number of required queries compared to the "split-in-half" approach. We experimented with different probability distributions of user errors and different qualities of the a priori probabilities. Our measurements demonstrated the superiority of entropy-based query selection even in cases where all fault probabilities are equal, i.e. where no information about typical user errors is available.

  14. Psychogenic non-epileptic seizures (PNES) on the Internet: Online representation of the disorder and frequency of search terms.

    PubMed

    Myers, Lorna; Jones, Jace; Boesten, Nadine; Lancman, Marcelo

    2016-08-01

    The nature of the symptoms associated with PNES require a multidisciplinary health team. There are too few professionals with an adequate understanding of PNES and therefore many are not able to provide patients with necessary information. In the age of the internet, it is not surprising that patients or caregivers might look for answers online. The purpose of this project was to investigate the online representation of PNES and search frequency for PNES and its associated terms. To determine online representation, searches of: Google®, twitter®, YouTube®, and Instagram® for "PNES" and associated terms were conducted. Websites, tweets, and films were classified by host and exclusivity of information. PNES and associated terms search frequency was determined through Google Trends®. Professional and patient sites exclusively about PNES were outnumbered by sites that only mentioned PNES in fewer than three posts. Patients tended to favor less traditional hosting options than did professionals. On twitter®, different keyword preferences were identified for professionals and patients. On YouTube® there was a substantial selection of videos of which 22 were professionally produced. Google Trends®, revealed the terms most commonly used to search for this topic were in order: (1) "PNES;" (2) "NEAD;" and (3) "pseudoseizure." A variety of professional and patient internet content about PNES can be found online. Professional sites offered accurate and empirically-validated information on the disorder and tended to use traditional hosting options. Future professional initiatives might consider novel hosting options and higher-frequency terms to reach their audience more effectively. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  15. Restricted natural language based querying of clinical databases.

    PubMed

    Safari, Leila; Patrick, Jon D

    2014-12-01

    To elevate the level of care to the community it is essential to provide usable tools for healthcare professionals to extract knowledge from clinical data. In this paper a generic translation algorithm is proposed to translate a restricted natural language query (RNLQ) to a standard query language like SQL (Structured Query Language). A special purpose clinical data analytics language (CliniDAL) has been introduced which provides scheme of six classes of clinical questioning templates. A translation algorithm is proposed to translate the RNLQ of users to SQL queries based on a similarity-based Top-k algorithm which is used in the mapping process of CliniDAL. Also a two layer rule-based method is used to interpret the temporal expressions of the query, based on the proposed temporal model. The mapping and translation algorithms are generic and thus able to work with clinical databases in three data design models, including Entity-Relationship (ER), Entity-Attribute-Value (EAV) and XML, however it is only implemented for ER and EAV design models in the current work. It is easy to compose a RNLQ via CliniDAL's interface in which query terms are automatically mapped to the underlying data models of a Clinical Information System (CIS) with an accuracy of more than 84% and the temporal expressions of the query comprising absolute times, relative times or relative events can be automatically mapped to time entities of the underlying CIS and to normalized temporal comparative values. The proposed solution of CliniDAL using the generic mapping and translation algorithms which is enhanced by a temporal analyzer component provides a simple mechanism for composing RNLQ for extracting knowledge from CISs with different data design models for analytics purposes. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Analysis of PubMed User Sessions Using a Full-Day PubMed Query Log: A Comparison of Experienced and Nonexperienced PubMed Users

    PubMed Central

    2015-01-01

    Background PubMed is the largest biomedical bibliographic information source on the Internet. PubMed has been considered one of the most important and reliable sources of up-to-date health care evidence. Previous studies examined the effects of domain expertise/knowledge on search performance using PubMed. However, very little is known about PubMed users’ knowledge of information retrieval (IR) functions and their usage in query formulation. Objective The purpose of this study was to shed light on how experienced/nonexperienced PubMed users perform their search queries by analyzing a full-day query log. Our hypotheses were that (1) experienced PubMed users who use system functions quickly retrieve relevant documents and (2) nonexperienced PubMed users who do not use them have longer search sessions than experienced users. Methods To test these hypotheses, we analyzed PubMed query log data containing nearly 3 million queries. User sessions were divided into two categories: experienced and nonexperienced. We compared experienced and nonexperienced users per number of sessions, and experienced and nonexperienced user sessions per session length, with a focus on how fast they completed their sessions. Results To test our hypotheses, we measured how successful information retrieval was (at retrieving relevant documents), represented as the decrease rates of experienced and nonexperienced users from a session length of 1 to 2, 3, 4, and 5. The decrease rate (from a session length of 1 to 2) of the experienced users was significantly larger than that of the nonexperienced groups. Conclusions Experienced PubMed users retrieve relevant documents more quickly than nonexperienced PubMed users in terms of session length. PMID:26139516

  17. Analysis of PubMed User Sessions Using a Full-Day PubMed Query Log: A Comparison of Experienced and Nonexperienced PubMed Users.

    PubMed

    Yoo, Illhoi; Mosa, Abu Saleh Mohammad

    2015-07-02

    PubMed is the largest biomedical bibliographic information source on the Internet. PubMed has been considered one of the most important and reliable sources of up-to-date health care evidence. Previous studies examined the effects of domain expertise/knowledge on search performance using PubMed. However, very little is known about PubMed users' knowledge of information retrieval (IR) functions and their usage in query formulation. The purpose of this study was to shed light on how experienced/nonexperienced PubMed users perform their search queries by analyzing a full-day query log. Our hypotheses were that (1) experienced PubMed users who use system functions quickly retrieve relevant documents and (2) nonexperienced PubMed users who do not use them have longer search sessions than experienced users. To test these hypotheses, we analyzed PubMed query log data containing nearly 3 million queries. User sessions were divided into two categories: experienced and nonexperienced. We compared experienced and nonexperienced users per number of sessions, and experienced and nonexperienced user sessions per session length, with a focus on how fast they completed their sessions. To test our hypotheses, we measured how successful information retrieval was (at retrieving relevant documents), represented as the decrease rates of experienced and nonexperienced users from a session length of 1 to 2, 3, 4, and 5. The decrease rate (from a session length of 1 to 2) of the experienced users was significantly larger than that of the nonexperienced groups. Experienced PubMed users retrieve relevant documents more quickly than nonexperienced PubMed users in terms of session length.

  18. Does the quality, accuracy, and readability of information about lateral epicondylitis on the internet vary with the search term used?

    PubMed

    Dy, Christopher J; Taylor, Samuel A; Patel, Ronak M; McCarthy, Moira M; Roberts, Timothy R; Daluiski, Aaron

    2012-12-01

    Concern exists over the quality, accuracy, and accessibility of online information about health care conditions. The goal of this study is to evaluate the quality, accuracy, and readability of information available on the internet about lateral epicondylitis. We used three different search terms ("tennis elbow," "lateral epicondylitis," and "elbow pain") in three search engines (Google, Bing, and Yahoo) to generate a list of 75 unique websites. Three orthopedic surgeons reviewed the content of each website and assessed the quality and accuracy of information. We assessed each website's readability using the Flesch-Kincaid method. Statistical comparisons were made using ANOVA with post hoc pairwise comparisons. The mean reading grade level was 11.1. None of the sites were under the recommended sixth grade reading level for the general public. Higher quality information was found when using the terms "tennis elbow" and "lateral epicondylitis" compared to "elbow pain" (p < 0.001). Specialty society websites had higher quality than all other websites (p < 0.001). The information was more accurate if the website was authored by a health care provider when compared to non-health care providers (p = 0.003). Websites seeking commercial gain and those found after the first five search results had lower quality information. Reliable information about lateral epicondylitis is available online, especially from specialty societies. However, the quality and accuracy of information vary significantly with the search term, website author, and order of search results. This leaves less educated patients at a disadvantage, particularly because the information we encountered is above the reading level recommended for the general public.

  19. Semantic Annotations and Querying of Web Data Sources

    NASA Astrophysics Data System (ADS)

    Hornung, Thomas; May, Wolfgang

    A large part of the Web, actually holding a significant portion of the useful information throughout the Web, consists of views on hidden databases, provided by numerous heterogeneous interfaces that are partly human-oriented via Web forms ("Deep Web"), and partly based on Web Services (only machine accessible). In this paper we present an approach for annotating these sources in a way that makes them citizens of the Semantic Web. We illustrate how queries can be stated in terms of the ontology, and how the annotations are used to selected and access appropriate sources and to answer the queries.

  20. Variability of patient spine education by Internet search engine.

    PubMed

    Ghobrial, George M; Mehdi, Angud; Maltenfort, Mitchell; Sharan, Ashwini D; Harrop, James S

    2014-03-01

    Patients are increasingly reliant upon the Internet as a primary source of medical information. The educational experience varies by search engine, search term, and changes daily. There are no tools for critical evaluation of spinal surgery websites. To highlight the variability between common search engines for the same search terms. To detect bias, by prevalence of specific kinds of websites for certain spinal disorders. Demonstrate a simple scoring system of spinal disorder website for patient use, to maximize the quality of information exposed to the patient. Ten common search terms were used to query three of the most common search engines. The top fifty results of each query were tabulated. A negative binomial regression was performed to highlight the variation across each search engine. Google was more likely than Bing and Yahoo search engines to return hospital ads (P=0.002) and more likely to return scholarly sites of peer-reviewed lite (P=0.003). Educational web sites, surgical group sites, and online web communities had a significantly higher likelihood of returning on any search, regardless of search engine, or search string (P=0.007). Likewise, professional websites, including hospital run, industry sponsored, legal, and peer-reviewed web pages were less likely to be found on a search overall, regardless of engine and search string (P=0.078). The Internet is a rapidly growing body of medical information which can serve as a useful tool for patient education. High quality information is readily available, provided that the patient uses a consistent, focused metric for evaluating online spine surgery information, as there is a clear variability in the way search engines present information to the patient. Published by Elsevier B.V.

  1. Performance evaluation of Unified Medical Language System®'s synonyms expansion to query PubMed.

    PubMed

    Griffon, Nicolas; Chebil, Wiem; Rollin, Laetitia; Kerdelhue, Gaetan; Thirion, Benoit; Gehanno, Jean-François; Darmoni, Stéfan Jacques

    2012-02-29

    PubMed is the main access to medical literature on the Internet. In order to enhance the performance of its information retrieval tools, primarily non-indexed citations, the authors propose a method: expanding users' queries using Unified Medical Language System' (UMLS) synonyms i.e. all the terms gathered under one unique Concept Unique Identifier. This method was evaluated using queries constructed to emphasize the differences between this new method and the current PubMed automatic term mapping. Four experts assessed citation relevance. Using UMLS, we were able to retrieve new citations in 45.5% of queries, which implies a small increase in recall. The new strategy led to a heterogeneous 23.7% mean increase in non-indexed citation retrieved. Of these, 82% have been published less than 4 months earlier. The overall mean precision was 48.4% but differed according to the evaluators, ranging from 36.7% to 88.1% (Inter rater agreement was poor: kappa = 0.34). This study highlights the need for specific search tools for each type of user and use-cases. The proposed strategy may be useful to retrieve recent scientific advancement.

  2. SEARCH: Study of Environmental Arctic Change--A System-scale, Cross-disciplinary, Long-term Arctic Research Program

    NASA Astrophysics Data System (ADS)

    Wiggins, H. V.; Schlosser, P.; Fox, S. E.

    2009-12-01

    The Study of Environmental Arctic Change (SEARCH) is a multi-agency effort to observe, understand, and guide responses to changes in the changing arctic system. Under the SEARCH program, guided by the Science Steering Committee (SSC), the Observing, Understanding, and Responding to Change panels, and the Interagency Program Management Committee (IPMC), scientists with a variety of expertise work together to achieve goals of the program. Over 150 projects and activities contribute to SEARCH implementation. The Observing Change component is underway through the NSF’s Arctic Observing Network (AON), NOAA-sponsored atmospheric and sea ice observations, and other relevant national and international efforts, including the EU-sponsored Developing Arctic Modeling and Observing Capabilities for Long-term Environmental Studies (DAMOCLES) Program. The Understanding Change component of SEARCH consists of modeling and analysis efforts, including the Sea Ice Outlook project, an international effort to provide a community-wide summary of the expected September arctic sea ice minimum. The Understanding Change component also has strong linkages to programs such as the NSF Arctic System Science (ARCSS) Program. The Responding to Change element will be launched through stakeholder-focused research and applications addressing social and economic concerns. As a national program under the International Study of Arctic Change (ISAC), SEARCH is working to expand international connections. The State of the Arctic Conference (soa.arcus.org), to be held 16-19 March 2010 in Miami, will be a milestone activity of SEARCH and will provide an international forum for discussion of future research directions aimed toward a better understanding of the arctic system and its trajectory. SEARCH is sponsored by eight U.S. agencies that comprise the IPMC, including: the National Science Foundation (NSF), the National Oceanic and Atmospheric Administration (NOAA), the National Aeronautics and Space

  3. Comparing the effectiveness of using generic and specific search terms in electronic databases to identify health outcomes for a systematic review: a prospective comparative study of literature search methods

    PubMed Central

    MacLean, Alice; Sweeting, Helen; Hunt, Kate

    2012-01-01

    Objective To compare the effectiveness of systematic review literature searches that use either generic or specific terms for health outcomes. Design Prospective comparative study of two electronic literature search strategies. The ‘generic’ search included general terms for health such as ‘adolescent health’, ‘health status’, ‘morbidity’, etc. The ‘specific’ search focused on terms for a range of specific illnesses, such as ‘headache’, ‘epilepsy’, ‘diabetes mellitus’, etc. Data sources The authors searched Medline, Embase, the Cumulative Index to Nursing and Allied Health Literature, PsycINFO and the Education Resources Information Center for studies published in English between 1992 and April 2010. Main outcome measures Number and proportion of studies included in the systematic review that were identified from each search. Results The two searches tended to identify different studies. Of 41 studies included in the final review, only three (7%) were identified by both search strategies, 21 (51%) were identified by the generic search only and 17 (41%) were identified by the specific search only. 5 of the 41 studies were also identified through manual searching methods. Studies identified by the two ELS differed in terms of reported health outcomes, while each ELS uniquely identified some of the review's higher quality studies. Conclusions Electronic literature searches (ELS) are a vital stage in conducting systematic reviews and therefore have an important role in attempts to inform and improve policy and practice with the best available evidence. While the use of both generic and specific health terms is conventional for many reviewers and information scientists, there are also reviews that rely solely on either generic or specific terms. Based on the findings, reliance on only the generic or specific approach could increase the risk of systematic reviews missing important evidence and, consequently, misinforming decision makers

  4. Evolutionary Multiobjective Query Workload Optimization of Cloud Data Warehouses

    PubMed Central

    Dokeroglu, Tansel; Sert, Seyyit Alper; Cinar, Muhammet Serkan

    2014-01-01

    With the advent of Cloud databases, query optimizers need to find paretooptimal solutions in terms of response time and monetary cost. Our novel approach minimizes both objectives by deploying alternative virtual resources and query plans making use of the virtual resource elasticity of the Cloud. We propose an exact multiobjective branch-and-bound and a robust multiobjective genetic algorithm for the optimization of distributed data warehouse query workloads on the Cloud. In order to investigate the effectiveness of our approach, we incorporate the devised algorithms into a prototype system. Finally, through several experiments that we have conducted with different workloads and virtual resource configurations, we conclude remarkable findings of alternative deployments as well as the advantages and disadvantages of the multiobjective algorithms we propose. PMID:24892048

  5. Evolutionary multiobjective query workload optimization of Cloud data warehouses.

    PubMed

    Dokeroglu, Tansel; Sert, Seyyit Alper; Cinar, Muhammet Serkan

    2014-01-01

    With the advent of Cloud databases, query optimizers need to find paretooptimal solutions in terms of response time and monetary cost. Our novel approach minimizes both objectives by deploying alternative virtual resources and query plans making use of the virtual resource elasticity of the Cloud. We propose an exact multiobjective branch-and-bound and a robust multiobjective genetic algorithm for the optimization of distributed data warehouse query workloads on the Cloud. In order to investigate the effectiveness of our approach, we incorporate the devised algorithms into a prototype system. Finally, through several experiments that we have conducted with different workloads and virtual resource configurations, we conclude remarkable findings of alternative deployments as well as the advantages and disadvantages of the multiobjective algorithms we propose.

  6. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments.

    PubMed

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-09-18

    A content-matched (CM) rangemonitoring query overmoving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CMrange monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods.

  7. Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments

    PubMed Central

    Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

    2015-01-01

    A content-matched (CM) range monitoring query over moving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CM range monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods. PMID:26393613

  8. The Use of ERIC Tapes in Scandinavia, Searching With Thesaurus Terms in Natural Language.

    ERIC Educational Resources Information Center

    Tell, Bjorn V.; And Others

    Since February 1971 the Royal Institute of Technology, Stockholm, has been running the ERIC data base mainly for SDI purposes. The implementation of the data base into the generalized search system, ABACUS, is described. One hundred and fifty-eight users received SDI service at present, 99 from governmental and educational institutions, 23 from…

  9. The Contemporary Thesaurus of Social Science Terms and Synonyms: A Guide for Natural Language Computer Searching.

    ERIC Educational Resources Information Center

    Knapp, Sara D., Comp.

    This book is designed primarily to help users find meaningful words for natural language, or free-text, computer searching of bibliographic and textual databases in the social and behavioral sciences. Additionally, it covers many socially relevant and technical topics not covered by the usual literary thesaurus, therefore it may also be useful for…

  10. Access to data: comparing AccessMed with Query by Review.

    PubMed Central

    Hripcsak, G; Allen, B; Cimino, J J; Lee, R

    1996-01-01

    OBJECTIVE: To evaluate the performance of tools for authoring patient database queries. DESIGN: Query by Review, a tool that exploits the training that users have undergone to master a result review system, was compared with AccessMed, a vocabulary browser that supports lexical matching and the traversal of hierarchical and semantic links. Seven subjects (Medical Logic Module authors) were asked to use both tools to gather the vocabulary terms necessary to perform each of eight laboratory queries. MEASUREMENTS: The proportion of queries that were correct; intersubject agreement. RESULTS: Query by Review had better performance than AccessMed (38% correct queries versus 18%, p = 0.002), but both figures were low. Poor intersubject agreement (28% for Query by Review and 21% for AccessMed) corroborated the relatively low performance. Subjects appeared to have trouble distinguishing laboratory tests from laboratory batteries, picking terms relevant to the particular data type required, and using classes in the vocabulary's hierarchy. CONCLUSION: Query by Review, with its more constrained user interface, performed somewhat better than AccessMed, a more general tool. Neither tool achieved adequate performance, however, which points to the difficulty of formulating a query for a clinical database and the need for further work. PMID:8816352

  11. Abyss or Shelter? On the Relevance of Web Search Engines' Search Results When People Google for Suicide.

    PubMed

    Haim, Mario; Arendt, Florian; Scherr, Sebastian

    2017-02-01

    Despite evidence that suicide rates can increase after suicides are widely reported in the media, appropriate depictions of suicide in the media can help people to overcome suicidal crises and can thus elicit preventive effects. We argue on the level of individual media users that a similar ambivalence can be postulated for search results on online suicide-related search queries. Importantly, the filter bubble hypothesis (Pariser, 2011) states that search results are biased by algorithms based on a person's previous search behavior. In this study, we investigated whether suicide-related search queries, including either potentially suicide-preventive or -facilitative terms, influence subsequent search results. This might thus protect or harm suicidal Internet users. We utilized a 3 (search history: suicide-related harmful, suicide-related helpful, and suicide-unrelated) × 2 (reactive: clicking the top-most result link and no clicking) experimental design applying agent-based testing. While findings show no influences either of search histories or of reactivity on search results in a subsequent situation, the presentation of a helpline offer raises concerns about possible detrimental algorithmic decision-making: Algorithms "decided" whether or not to present a helpline, and this automated decision, then, followed the agent throughout the rest of the observation period. Implications for policy-making and search providers are discussed.

  12. Pre filtered Dynamic Time Warping for Posteriorgram Based Keyword Search

    DTIC Science & Technology

    2017-02-09

    given text queries. The DTW algorithm is used to determine the optimal alignment between the posteriorgrams of the audio data and the queries. Since DTW...DTW algorithm is carried out on the full resolution cost matrix without any reduction of the time series. The size of the search space is decreased by...the given text queries need to be generated. For in-vocabulary (IV) queries, a pronunciation lexicon is used to convert the query into a phone

  13. Clean Air Markets - Facility Attributes and Contacts Query Wizard

    EPA Pesticide Factsheets

    The Facility Attributes and Contacts Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://camddataandmaps.epa.gov/gdm/index.cfm. The Facility Attributes and Contact module gives the user access to current and historical facility, owner, and representative data using custom queries, via the Facility Attributes Query Wizard, or Quick Reports. In addition, data regarding EPA, State, and local agency staff are also available. The Query Wizard can be used to search for data about a facility or facilities by identifying characteristics such as associated programs, owners, representatives, locations, and unit characteristics, facility inventories, and classifications.EPA's Clean Air Markets Division (CAMD) includes several market-based regulatory programs designed to improve air quality and ecosystems. The most well-known of these programs are EPA's Acid Rain Program and the NOx Programs, which reduce emissions of sulfur dioxide (SO2) and nitrogen oxides (NOx)-compounds that adversely affect air quality, the environment, and public health. CAMD also plays an integral role in the development and implementation of the Clean Air Interstate Rule (CAIR).

  14. Federated query services provided by the Seamless SAR Archive project

    NASA Astrophysics Data System (ADS)

    Baker, S.; Bryson, G.; Buechler, B.; Meertens, C. M.; Crosby, C. J.; Fielding, E. J.; Nicoll, J.; Youn, C.; Baru, C.

    2013-12-01

    The NASA Advancing Collaborative Connections for Earth System Science (ACCESS) seamless synthetic aperture radar (SAR) archive (SSARA) project is a 2-year collaboration between UNAVCO, the Alaska Satellite Facility (ASF), the Jet Propulsion Laboratory (JPL), and OpenTopography at the San Diego Supercomputer Center (SDSC) to design and implement a seamless distributed access system for SAR data and derived data products (i.e. interferograms). A major milestone for the first year of the SSARA project was a unified application programming interface (API) for SAR data search and results at ASF and UNAVCO (WInSAR and EarthScope data archives) through the use of simple web services. A federated query service was developed using the unified APIs, providing users a single search interface for both archives (http://www.unavco.org/ws/brokered/ssara/sar/search). A command line client that utilizes this new service is provided as an open source utility for the community on GitHub (https://github.com/bakerunavco/SSARA). Further API development and enhancements added more InSAR specific keywords and quality control parameters (Doppler centroid, faraday rotation, InSAR stack size, and perpendicular baselines). To facilitate InSAR processing, the federated query service incorporated URLs for DEM (from OpenTopography) and tropospheric corrections (from the JPL OSCAR service) in addition to the URLs for SAR data. This federated query service will provide relevant QC metadata for selecting pairs of SAR data for InSAR processing and all the URLs necessary for interferogram generation. Interest from the international community has prompted an effort to incorporate other SAR data archives (the ESA Virtual Archive 4 and the DLR TerraSAR-X_SSC Geohazard Supersites and Natural Laboratories collections) into the federated query service which provide data for researchers outside the US and North America.

  15. Fixing Dataset Search

    NASA Technical Reports Server (NTRS)

    Lynnes, Chris

    2014-01-01

    Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.

  16. Querying Semi-Structured Data

    NASA Technical Reports Server (NTRS)

    Abiteboul, Serge

    1997-01-01

    The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in the systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specic interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of data- formats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semi- structured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data.

  17. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation

    PubMed Central

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked

  18. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.

    PubMed

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as 'CHEMICAL-1 compared to CHEMICAL-2' With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical-disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order

  19. Spatial aggregation query in dynamic geosensor networks

    NASA Astrophysics Data System (ADS)

    Yi, Baolin; Feng, Dayang; Xiao, Shisong; Zhao, Erdun

    2007-11-01

    Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In many of these applications, the researches mainly aim at building sensor network based systems to leverage the sensed data to applications. However, the existing works seldom exploited spatial aggregation query considering the dynamic characteristics of sensor networks. In this paper, we investigate how to process spatial aggregation query over dynamic geosensor networks where both the sink node and sensor nodes are mobile and propose several novel improvements on enabling techniques. The mobility of sensors makes the existing routing protocol based on information of fixed framework or the neighborhood infeasible. We present an improved location-based stateless implicit geographic forwarding (IGF) protocol for routing a query toward the area specified by query window, a diameter-based window aggregation query (DWAQ) algorithm for query propagation and data aggregation in the query window, finally considering the location changing of the sink node, we present two schemes to forward the result to the sink node. Simulation results show that the proposed algorithms can improve query latency and query accuracy.

  20. A Search for Long-term Slow Slip Along the Cascadia Subduction Zone

    NASA Astrophysics Data System (ADS)

    Nuyen, C.; Schmidt, D. A.

    2016-12-01

    Japan's Nankai Trough and the Cascadia Subduction Zone are often compared as analogous systems due to their striking similarities, which include relatively high thermal gradients, young incoming plates, and the occurrence of short-term slow slip episodes (SSEs). However, a lack of long-term SSEs in Cascadia sets it apart from Nankai, which experiences both short- and long-term SSEs. This disparity between Cascadia and Nankai begs the question of whether long-term SSEs are in fact absent in Cascadia. We examine GPS data from the PBO and PANGA networks to determine whether or not Cascadia has hosted a long-term SSE in the past 20 years. A preliminary review of the time series does not reveal any large-scale multi-year transients, such as has been documented in Japan and Alaska where over 5 cm of surface displacement is seen over multiple years. In order to more clearly recognize possible small amplitude long-term SSEs in Cascadia, the GPS data are reduced as follows: time series are cleaned by removing (1) continental water loading terms, (2) transient displacements of known short-term SSEs, and (3) common mode signals that span the network. After cleaning, the GPS data are manually inspected for coherent trends between stations. To further identify small amplitude slip events that persist for months-to-years, we invert the cleaned time series in Cascadia for fault slip using a principle component analysis-based inversion method. We also perform a suite of synthetic forward models to better understand how a long-term slow slip event might appear in the time series. Results from this research have direct implications for Cascadia in terms of moment release, stress redistributions, and seismic cycles. In a broader sense, these results also influence the global knowledge of SSEs by giving a better understanding of the full range of slip modes in Cascadia.

  1. Query-biased preview over outsourced and encrypted data.

    PubMed

    Peng, Ningduo; Luo, Guangchun; Qin, Ke; Chen, Aiguo

    2013-01-01

    For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length.

  2. Secure quantum private information retrieval using phase-encoded queries

    SciTech Connect

    Olejnik, Lukasz

    2011-08-15

    We propose a quantum solution to the classical private information retrieval (PIR) problem, which allows one to query a database in a private manner. The protocol offers privacy thresholds and allows the user to obtain information from a database in a way that offers the potential adversary, in this model the database owner, no possibility of deterministically establishing the query contents. This protocol may also be viewed as a solution to the symmetrically private information retrieval problem in that it can offer database security (inability for a querying user to steal its contents). Compared to classical solutions, the protocol offersmore » substantial improvement in terms of communication complexity. In comparison with the recent quantum private queries [Phys. Rev. Lett. 100, 230502 (2008)] protocol, it is more efficient in terms of communication complexity and the number of rounds, while offering a clear privacy parameter. We discuss the security of the protocol and analyze its strengths and conclude that using this technique makes it challenging to obtain the unconditional (in the information-theoretic sense) privacy degree; nevertheless, in addition to being simple, the protocol still offers a privacy level. The oracle used in the protocol is inspired both by the classical computational PIR solutions as well as the Deutsch-Jozsa oracle.« less

  3. Accessing Suicide-Related Information on the Internet: A Retrospective Observational Study of Search Behavior

    PubMed Central

    2013-01-01

    Background The Internet’s potential impact on suicide is of major public health interest as easy online access to pro-suicide information or specific suicide methods may increase suicide risk among vulnerable Internet users. Little is known, however, about users’ actual searching and browsing behaviors of online suicide-related information. Objective To investigate what webpages people actually clicked on after searching with suicide-related queries on a search engine and to examine what queries people used to get access to pro-suicide websites. Methods A retrospective observational study was done. We used a web search dataset released by America Online (AOL). The dataset was randomly sampled from all AOL subscribers’ web queries between March and May 2006 and generated by 657,000 service subscribers. Results We found 5526 search queries (0.026%, 5526/21,000,000) that included the keyword "suicide". The 5526 search queries included 1586 different search terms and were generated by 1625 unique subscribers (0.25%, 1625/657,000). Of these queries, 61.38% (3392/5526) were followed by users clicking on a search result. Of these 3392 queries, 1344 (39.62%) webpages were clicked on by 930 unique users but only 1314 of those webpages were accessible during the study period. Each clicked-through webpage was classified into 11 categories. The categories of the most visited webpages were: entertainment (30.13%; 396/1314), scientific information (18.31%; 240/1314), and community resources (14.53%; 191/1314). Among the 1314 accessed webpages, we could identify only two pro-suicide websites. We found that the search terms used to access these sites included “commiting suicide with a gas oven”, “hairless goat”, “pictures of murder by strangulation”, and “photo of a severe burn”. A limitation of our study is that the database may be dated and confined to mainly English webpages. Conclusions Searching or browsing suicide-related or pro-suicide webpages was

  4. Accessing suicide-related information on the internet: a retrospective observational study of search behavior.

    PubMed

    Wong, Paul Wai-Ching; Fu, King-Wa; Yau, Rickey Sai-Pong; Ma, Helen Hei-Man; Law, Yik-Wa; Chang, Shu-Sen; Yip, Paul Siu-Fai

    2013-01-11

    The Internet's potential impact on suicide is of major public health interest as easy online access to pro-suicide information or specific suicide methods may increase suicide risk among vulnerable Internet users. Little is known, however, about users' actual searching and browsing behaviors of online suicide-related information. To investigate what webpages people actually clicked on after searching with suicide-related queries on a search engine and to examine what queries people used to get access to pro-suicide websites. A retrospective observational study was done. We used a web search dataset released by America Online (AOL). The dataset was randomly sampled from all AOL subscribers' web queries between March and May 2006 and generated by 657,000 service subscribers. We found 5526 search queries (0.026%, 5526/21,000,000) that included the keyword "suicide". The 5526 search queries included 1586 different search terms and were generated by 1625 unique subscribers (0.25%, 1625/657,000). Of these queries, 61.38% (3392/5526) were followed by users clicking on a search result. Of these 3392 queries, 1344 (39.62%) webpages were clicked on by 930 unique users but only 1314 of those webpages were accessible during the study period. Each clicked-through webpage was classified into 11 categories. The categories of the most visited webpages were: entertainment (30.13%; 396/1314), scientific information (18.31%; 240/1314), and community resources (14.53%; 191/1314). Among the 1314 accessed webpages, we could identify only two pro-suicide websites. We found that the search terms used to access these sites included "commiting suicide with a gas oven", "hairless goat", "pictures of murder by strangulation", and "photo of a severe burn". A limitation of our study is that the database may be dated and confined to mainly English webpages. Searching or browsing suicide-related or pro-suicide webpages was uncommon, although a small group of users did access websites that contain

  5. Relevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach

    PubMed Central

    Singh, Jagendra; Sharan, Aditi

    2015-01-01

    Pseudo-Relevance Feedback (PRF) is a well-known method of query expansion for improving the performance of information retrieval systems. All the terms of PRF documents are not important for expanding the user query. Therefore selection of proper expansion term is very important for improving system performance. Individual query expansion terms selection methods have been widely investigated for improving its performance. Every individual expansion term selection method has its own weaknesses and strengths. To overcome the weaknesses and to utilize the strengths of the individual method, we used multiple terms selection methods together. In this paper, first the possibility of improving the overall performance using individual query expansion terms selection methods has been explored. Second, Borda count rank aggregation approach is used for combining multiple query expansion terms selection methods. Third, the semantic similarity approach is used to select semantically similar terms with the query after applying Borda count ranks combining approach. Our experimental results demonstrated that our proposed approaches achieved a significant improvement over individual terms selection method and related state-of-the-art methods. PMID:26770189

  6. Relevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach.

    PubMed

    Singh, Jagendra; Sharan, Aditi

    2015-01-01

    Pseudo-Relevance Feedback (PRF) is a well-known method of query expansion for improving the performance of information retrieval systems. All the terms of PRF documents are not important for expanding the user query. Therefore selection of proper expansion term is very important for improving system performance. Individual query expansion terms selection methods have been widely investigated for improving its performance. Every individual expansion term selection method has its own weaknesses and strengths. To overcome the weaknesses and to utilize the strengths of the individual method, we used multiple terms selection methods together. In this paper, first the possibility of improving the overall performance using individual query expansion terms selection methods has been explored. Second, Borda count rank aggregation approach is used for combining multiple query expansion terms selection methods. Third, the semantic similarity approach is used to select semantically similar terms with the query after applying Borda count ranks combining approach. Our experimental results demonstrated that our proposed approaches achieved a significant improvement over individual terms selection method and related state-of-the-art methods.

  7. Aggregating Queries Against Large Inventories of Remotely Accessible Data

    NASA Astrophysics Data System (ADS)

    Gallagher, J. H. R.; Fulker, D. W.

    2016-12-01

    Those seeking to discover data for a specific purpose often encounter search results that are so large as to be useless without computing assistance. This situation arises, with increasing frequency, in part because repositories contain ever greater numbers of granules, and their granularities may well be poorly aligned or even orthogonal to the data-selection needs of the user. This presentation describes a recently developed service for simultaneously querying large lists of OPeNDAP-accessible granules to extract specified data. The specifications include a richly expressive set of data-selection criteria—applicable to content as well as metadata—and the service has been tested successfully against lists naming hundreds of thousands of granules. Querying such numbers of local files (i.e., granules) on a desktop or laptop computer is practical (by using a scripting language, e.g.), but this practicality is diminished when the data are remote and thus best accessed through a Web-services interface. In these cases, which are increasingly common, scripted queries can take many hours because of inherent network latencies. Furthermore, communication dropouts can add fragility to such scripts, yielding gaps in the acquired results. In contrast, OPeNDAP's new aggregated-query services enable data discovery in the context of very large inventory sizes. These capabilities have been developed for use with OPeNDAP's Hyrax server, which is an open-source realization of DAP (for "Data Access Protocol," a specification widely used in NASA, NOAA and other data-intensive contexts). These aggregated-query services exhibit good response times (on the order of seconds, not hours) even for inventories that list hundreds of thousands of source granules.

  8. Visual graph query formulation and exploration: a new perspective on information retrieval at the edge

    NASA Astrophysics Data System (ADS)

    Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng

    2016-05-01

    Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.

  9. Visual search for changes in scenes creates long-term, incidental memory traces.

    PubMed

    Utochkin, Igor S; Wolfe, Jeremy M

    2018-02-09

    Humans are very good at remembering large numbers of scenes over substantial periods of time. But how good are they at remembering changes to scenes? In this study, we tested scene memory and change detection two weeks after initial scene learning. In Experiments 1-3, scenes were learned incidentally during visual search for change. In Experiment 4, observers explicitly memorized scenes. At test, after two weeks observers were asked to discriminate old from new scenes, to recall a change that they had detected in the study phase, or to detect a newly introduced change in the memorization experiment. Next, they performed a change detection task, usually looking for the same change as in the study period. Scene recognition memory was found to be similar in all experiments, regardless of the study task. In Experiment 1, more difficult change detection produced better scene memory. Experiments 2 and 3 supported a "depth-of-processing" account for the effects of initial search and change detection on incidental memory for scenes. Of most interest, change detection was faster during the test phase than during the study phase, even when the observer had no explicit memory of having found that change previously. This result was replicated in two of our three change detection experiments. We conclude that scenes can be encoded incidentally as well as explicitly and that changes in those scenes can leave measurable traces even if they are not explicitly recalled.

  10. Fast source camera identification using matching signs between query and reference fingerprints.

    PubMed

    Hu, Yongjian; Li, Chang-Tsun; Lai, Zhimao

    Fast camera fingerprint search is an important issue for source camera identification in real-world applications. So far there has been little work done in this area. In this paper, we propose a novel fast search algorithm. We use global information derived from the relationship between the query fingerprint/digest and the reference fingerprints/digests in the database to guide fast search. This information can provide more accurate and robust clues for the selection of candidate matching database fingerprints. Because the quality of query fingerprints may degrade or vary in realistic applications, the construction of robust search clues is significant. To speed up the search process, we adopt a lookup table that is built on the separate-chaining hash table. The proposed algorithm has been tested using query images from real-world photos. Experiments demonstrate that our algorithm can well adapt to query fingerprints with different quality. It can achieve higher detection rates with lower computational cost than the traditional brute-force search algorithm and a pioneering fast search algorithm in literature.

  11. The development of automaticity in short-term memory search: Item-response learning and category learning.

    PubMed

    Cao, Rui; Nosofsky, Robert M; Shiffrin, Richard M

    2017-05-01

    In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across trials. In item-response learning, subjects learn long-term mappings between individual items and target versus foil responses. In category learning, subjects learn high-level codes corresponding to separate sets of items and learn to attach old versus new responses to these category codes. To distinguish between these 2 forms of learning, we tested subjects in categorized varied mapping (CV) conditions: There were 2 distinct categories of items, but the assignment of categories to target versus foil responses varied across trials. In cases involving arbitrary categories, CV performance closely resembled standard varied-mapping performance without categories and departed dramatically from CM performance, supporting the item-response-learning hypothesis. In cases involving prelearned categories, CV performance resembled CM performance, as long as there was sufficient practice or steps taken to reduce trial-to-trial category-switching costs. This pattern of results supports the category-coding hypothesis for sufficiently well-learned categories. Thus, item-response learning occurs rapidly and is used early in CM training; category learning is much slower but is eventually adopted and is used to increase the efficiency of search beyond that available from item-response learning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  12. Federated ontology-based queries over cancer data

    PubMed Central

    2012-01-01

    Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user

  13. Federated ontology-based queries over cancer data.

    PubMed

    González-Beltrán, Alejandra; Tagger, Ben; Finkelstein, Anthony

    2012-01-25

    Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been

  14. Flexible Querying of Lifelong Learner Metadata

    ERIC Educational Resources Information Center

    Poulovassilis, A.; Selmer, P.; Wood, P. T.

    2012-01-01

    This paper discusses the provision of flexible querying facilities over heterogeneous data arising from lifelong learners' educational and work experiences. A key aim of such querying facilities is to allow learners to identify possible choices for their future learning and professional development by seeing what others have done. We motivate and…

  15. An Efficient Implementation of Query/Advertise

    DTIC Science & Technology

    2003-03-31

    apparently still an open question [2] if Freenet can be extended to support the general queries provided by query/advertise. Astrolabe [20] provides yet...http://www.rv.tibco.com/rvwhitepaper.html. [20] M. van Renesse, K. Birman, and W. Vogels. Scalable Management, and Data Mining Using Astrolabe . In

  16. Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System

    PubMed Central

    Graiser, Michael; Moore, Susan G.; Victor, Rochelle; Hilliard, Ashley; Hill, Leroy; Keehan, Michael S.; Flowers, Christopher R.

    2007-01-01

    Background: Large linked databases (LLDB) represent a novel resource for cancer outcomes research. However, accurate means of identifying a patient population of interest within these LLDBs can be challenging. Our research group developed a fully integrated platform that provides a means of combining independent legacy databases into a single cancer-focused LLDB system. We compared the sensitivity and specificity of several SQL-based query strategies for identifying a histologic lymphoma subtype in this LLDB to determine the most accurate legacy data source for identifying a specific cancer patient population. Methods: Query strategies were developed to identify patients with follicular lymphoma from a LLDB of cancer registry data, electronic medical records (EMR), laboratory, administrative, pharmacy, and other clinical data. Queries were performed using common diagnostic codes (ICD-9), cancer registry histology codes (ICD-O), and text searches of EMRs. We reviewed medical records and pathology reports to confirm each diagnosis and calculated the sensitivity and specificity for each query strategy. Results: Together the queries identified 1538 potential cases of follicular lymphoma. Review of pathology and other medical reports confirmed 415 cases of follicular lymphoma, 300 pathology-verified and 115 verified from other medical reports. The query using ICD-O codes was highly specific (96%). Queries using text strings varied in sensitivity (range 7–92%) and specificity (range 86–99%). Queries using ICD-9 codes were both less sensitive (34–44%) and specific (35–87%). Conclusions: Queries of linked-cancer databases that include cancer registry data should utilize ICD-O codes or employ structured free-text searches to identify patient populations with a precise histologic diagnosis. PMID:19455241

  17. AnchorQuery: Rapid online virtual screening for small-molecule protein-protein interaction inhibitors.

    PubMed

    Koes, David R; Dömling, Alexander; Camacho, Carlos J

    2018-01-01

    AnchorQuery (http://anchorquery.csb.pitt.edu) is a web application for rational structure-based design of protein-protein interaction (PPI) inhibitors. A specialized variant of pharmacophore search is used to rapidly screen libraries consisting of more than 31 million synthesizable compounds biased by design to preferentially target PPIs. Every library compound is accessible through one-step multi-component reaction (MCR) chemistry and contains an anchor motif that is bioisosteric to an amino acid residue. The inclusion of this anchor not only biases the compounds to interact with proteins, it also enables a rapid, sublinear time pharmacophore search algorithm. AnchorQuery provides all the tools necessary for users to perform online interactive virtual screens of millions of compounds, including pharmacophore elucidation and search, and enrichment analysis. Accessibility: AnchorQuery is freely accessible at http://anchorquery.csb.pitt.edu. © 2017 The Protein Society.

  18. Worked Examples in Teaching Queries for Searching Academic Databases

    ERIC Educational Resources Information Center

    Kickham-Samy, Mary

    2013-01-01

    The worked-example effect, an application of cognitive load theory, is a well-supported method of instruction for well-structured problems (Chandler and Sweller, 1991; Cooper and Sweller, 1987; Sweller and Cooper, 1985; Tuovinen & Sweller, 1999; Ward and Sweller, 1990). One limitation is expertise-reversal effect, where advanced students…

  19. Evaluation of strategies for multiple sphere queries with local image descriptors

    NASA Astrophysics Data System (ADS)

    Bouteldja, Nouha; Gouet-Brunet, Valérie; Scholl, Michel

    2006-01-01

    In this paper, we are interested in the fast retrieval, in a large collection of points in high-dimensional space, of points close to a set of m query points (a multiple query): we want to efficiently find the sequence A i,iɛ1,m} where A i is the set of points within a sphere of center query point p i,iɛ{1,m} and radius ɛ (a sphere query). It has been argued that beyond a rather small dimension (d >= 10) for such sphere queries as well as for other similarity queries, sequentially scanning the collection of points is faster than crossing a tree structure indexing the collection (the so-called curse of dimensionality phenomenon). Our first contribution is to experimentally assess whether the curse of dimensionality is reached with various points distributions. We compare the performance of a single sphere query when the collection is indexed by a tree structure (an SR-tree in our experiments) to that of a sequential scan. The second objective of this paper is to propose and evaluate several algorithms for multiple queries in a collection of points indexed by a tree structure. We compare the performance of these algorithms to that of a naive one consisting in sequentially running the m queries. This study is applied to content-based image retrieval where images are described by local descriptors based on points of interest. Such descriptors involve a relatively small dimension (8 to 30) justifying that the collection of points be indexed by a tree structure; similarity search with local descriptors implies multiple sphere queries that are usually time expensive, justifying the proposal of new strategies.

  20. Surgical and conservative treatment of patients with congenital scoliosis: α search for long-term results

    PubMed Central

    2011-01-01

    Background In view of the limited data available on the conservative treatment of patients with congenital scoliosis (CS), early surgery is suggested in mild cases with formation failures. Patients with segmentation failures will not benefit from conservative treatment. The purpose of this review is to identify the mid- or long-term results of spinal fusion surgery in patients with congenital scoliosis. Methods Retrospective and prospective studies were included, reporting on the outcome of surgery in patients with congenital scoliosis. Studies concerning a small numbers of cases treated conservatively were included too. We analyzed mid-term (5 to 7 years) and long-term results (7 years or more), both as regards the maintenance of the correction of scoliosis and the safety of instrumentation, the early and late complications of surgery and their effect on quality of life. Results A small number of studies of surgically treated patients were found, contained follow-up periods of 4-6 years that in the most cases, skeletal maturity was not yet reached, and few with follow-up of 36-44 years. The results of bracing in children with congenital scoliosis, mainly in cases with failure of formation, were also studied. Discussion Spinal surgery in patients with congenital scoliosis is regarded in short as a safe procedure and should be performed. On the other hand, early and late complications are also described, concerning not only intraoperative and immediate postoperative problems, but also the safety and efficacy of the spinal instrumentation and the possibility of developing neurological disorders and the long-term effect these may have on both lung function and the quality of life of children. Conclusions Few cases indicate the long-term results of surgical techniques, in the natural progression of scoliosis. Similarly, few cases have been reported on the influence of conservative treatment. In conclusion, patients with segmentation failures should be treated

  1. Long-term Doppler Shift and Line Profile Studies of Planetary Search Target Stars

    NASA Technical Reports Server (NTRS)

    McMillan, Robert S.

    2002-01-01

    This grant supported attempts to develop a method for measuring the Doppler shifts of solar-type stars more accurately. The expense of future space borne telescopes to search for solar systems like our own makes it worth trying to improve the relatively inexpensive pre-flight reconnaissance by ground-based telescopes. The concepts developed under this grant contributed to the groundwork for such improvements. They were focused on how to distinguish between extrasolar planets and stellar activity (convection) cycles. To measure the Doppler shift (radial velocity; RV) of the center of mass of a star in the presence of changing convection in the star's photosphere, one can either measure the effect of convection separately from that of the star's motion and subtract its contribution to the apparent RV, or measure the RV in a way that is insensitive to convection. This grant supported investigations into both of these approaches. We explored the use of a Fabry-Perot Etalon HE interferometer and a multichannel Fourier Transform Spectrometer (mFTS), and finished making a 1.8-m telescope operational and potentially available for this work.

  2. How to improve your PubMed/MEDLINE searches: 3. advanced searching, MeSH and My NCBI.

    PubMed

    Fatehi, Farhad; Gray, Leonard C; Wootton, Richard

    2014-03-01

    Although the basic PubMed search is often helpful, the results may sometimes be non-specific. For more control over the search process you can use the Advanced Search Builder interface. This allows a targeted search in specific fields, with the convenience of being able to select the intended search field from a list. It also provides a history of your previous searches. The search history is useful to develop a complex search query by combining several previous searches using Boolean operators. For indexing the articles in MEDLINE, the NLM uses a controlled vocabulary system called MeSH. This standardised vocabulary solves the problem of authors, researchers and librarians who may use different terms for the same concept. To be efficient in a PubMed search, you should start by identifying the most appropriate MeSH terms and use them in your search where possible. My NCBI is a personal workspace facility available through PubMed and makes it possible to customise the PubMed interface. It provides various capabilities that can enhance your search performance.

  3. BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines.

    PubMed

    Jácome, Alberto G; Fdez-Riverola, Florentino; Lourenço, Anália

    2016-07-01

    Text mining and semantic analysis approaches can be applied to the construction of biomedical domain-specific search engines and provide an attractive alternative to create personalized and enhanced search experiences. Therefore, this work introduces the new open-source BIOMedical Search Engine Framework for the fast and lightweight development of domain-specific search engines. The rationale behind this framework is to incorporate core features typically available in search engine frameworks with flexible and extensible technologies to retrieve biomedical documents, annotate meaningful domain concepts, and develop highly customized Web search interfaces. The BIOMedical Search Engine Framework integrates taggers for major biomedical concepts, such as diseases, drugs, genes, proteins, compounds and organisms, and enables the use of domain-specific controlled vocabulary. Technologies from the Typesafe Reactive Platform, the AngularJS JavaScript framework and the Bootstrap HTML/CSS framework support the customization of the domain-oriented search application. Moreover, the RESTful API of the BIOMedical Search Engine Framework allows the integration of the search engine into existing systems or a complete web interface personalization. The construction of the Smart Drug Search is described as proof-of-concept of the BIOMedical Search Engine Framework. This public search engine catalogs scientific literature about antimicrobial resistance, microbial virulence and topics alike. The keyword-based queries of the users are transformed into concepts and search results are presented and ranked accordingly. The semantic graph view portraits all the concepts found in the results, and the researcher may look into the relevance of different concepts, the strength of direct relations, and non-trivial, indirect relations. The number of occurrences of the concept shows its importance to the query, and the frequency of concept co-occurrence is indicative of biological relations

  4. Investigating the Semantic Gap through Query Log Analysis

    NASA Astrophysics Data System (ADS)

    Mika, Peter; Meij, Edgar; Zaragoza, Hugo

    Significant efforts have focused in the past years on bringing large amounts of metadata online and the success of these efforts can be seen by the impressive number of web sites exposing data in RDFa or RDF/XML. However, little is known about the extent to which this data fits the needs of ordinary web users with everyday information needs. In this paper we study what we perceive as the semantic gap between the supply of data on the Semantic Web and the needs of web users as expressed in the queries submitted to a major Web search engine. We perform our analysis on both the level of instances and ontologies. First, we first look at how much data is actually relevant to Web queries and what kind of data is it. Second, we provide a generic method to extract the attributes that Web users are searching for regarding particular classes of entities. This method allows to contrast class definitions found in Semantic Web vocabularies with the attributes of objects that users are interested in. Our findings are crucial to measuring the potential of semantic search, but also speak to the state of the Semantic Web in general.

  5. Using a search engine-based mutually reinforcing approach to assess the semantic relatedness of biomedical terms.

    PubMed

    Hsu, Yi-Yu; Chen, Hung-Yu; Kao, Hung-Yu

    2013-01-01

    Determining the semantic relatedness of two biomedical terms is an important task for many text-mining applications in the biomedical field. Previous studies, such as those using ontology-based and corpus-based approaches, measured semantic relatedness by using information from the structure of biomedical literature, but these methods are limited by the small size of training resources. To increase the size of training datasets, the outputs of search engines have been used extensively to analyze the lexical patterns of biomedical terms. In this work, we propose the Mutually Reinforcing Lexical Pattern Ranking (ReLPR) algorithm for learning and exploring the lexical patterns of synonym pairs in biomedical text. ReLPR employs lexical patterns and their pattern containers to assess the semantic relatedness of biomedical terms. By combining sentence structures and the linking activities between containers and lexical patterns, our algorithm can explore the correlation between two biomedical terms. The average correlation coefficient of the ReLPR algorithm was 0.82 for various datasets. The results of the ReLPR algorithm were significantly superior to those of previous methods.

  6. Using a Search Engine-Based Mutually Reinforcing Approach to Assess the Semantic Relatedness of Biomedical Terms

    PubMed Central

    Hsu, Yi-Yu; Chen, Hung-Yu; Kao, Hung-Yu

    2013-01-01

    Background Determining the semantic relatedness of two biomedical terms is an important task for many text-mining applications in the biomedical field. Previous studies, such as those using ontology-based and corpus-based approaches, measured semantic relatedness by using information from the structure of biomedical literature, but these methods are limited by the small size of training resources. To increase the size of training datasets, the outputs of search engines have been used extensively to analyze the lexical patterns of biomedical terms. Methodology/Principal Findings In this work, we propose the Mutually Reinforcing Lexical Pattern Ranking (ReLPR) algorithm for learning and exploring the lexical patterns of synonym pairs in biomedical text. ReLPR employs lexical patterns and their pattern containers to assess the semantic relatedness of biomedical terms. By combining sentence structures and the linking activities between containers and lexical patterns, our algorithm can explore the correlation between two biomedical terms. Conclusions/Significance The average correlation coefficient of the ReLPR algorithm was 0.82 for various datasets. The results of the ReLPR algorithm were significantly superior to those of previous methods. PMID:24348899

  7. Searching for the elusive neural substrates of body part terms: a neuropsychological study.

    PubMed

    Kemmerer, David; Tranel, Daniel

    2008-06-01

    Previous neuropsychological studies suggest that, compared to other categories of concrete entities, lexical and conceptual aspects of body part knowledge are frequently spared in brain-damaged patients. To further investigate this issue, we administered a battery of 12 tests assessing lexical and conceptual aspects of body part knowledge to 104 brain-damaged patients with lesions distributed throughout the telencephalon. There were two main outcomes. First, impaired oral naming of body parts, attributable to a disturbance of the mapping between lexical-semantic and lexical-phonological structures, was most reliably and specifically associated with lesions in the left frontal opercular and anterior/inferior parietal opercular cortices and in the white matter underlying these regions (8 patients). Also, 1 patient with body part anomia had a left occipital lesion that included the "extrastriate body area" (EBA). Second, knowledge of the meanings of body part terms was remarkably resistant to impairment, regardless of lesion site; in fact, we did not uncover a single patient who exhibited significantly impaired understanding of the meanings of these terms. In the 9 patients with body part anomia, oral naming of concrete entities was evaluated, and this revealed that 4 patients had disproportionately worse naming of body parts relative to other types of concrete entities. Taken together, these findings extend previous neuropsychological and functional neuroimaging studies of body part knowledge and add to our growing understanding of the nuances of how different linguistic and conceptual categories are operated by left frontal and parietal structures.

  8. Searching for the Elusive Neural Substrates of Body Part Terms: A Neuropsychological Study

    PubMed Central

    Kemmerer, David; Tranel, Daniel

    2010-01-01

    Previous neuropsychological studies suggest that, compared to other categories of concrete entities, lexical and conceptual aspects of body part knowledge are frequently spared in brain-damaged patients. To further investigate this issue, we administered a battery of 12 tests assessing lexical and conceptual aspects of body part knowledge to 104 brain-damaged patients with lesions distributed throughout the telencephalon. There were two main outcomes. First, impaired oral naming of body parts, attributable to a disturbance of the mapping between lexical-semantic and lexical-phonological structures, was most reliably and specifically associated with lesions in the left frontal opercular and anterior/inferior parietal opercular cortices, and in the white matter underlying these regions (8 patients). Also, one patient with body part anomia had a left occipital lesion that included the “extrastriate body area” (EBA). Second, knowledge of the meanings of body part terms was remarkably resistant to impairment, regardless of lesion site; in fact, we did not uncover a single patient who exhibited significantly impaired understanding of the meanings of these terms. In the 9 patients with body part anomia, oral naming of concrete entities was evaluated, and this revealed that 4 patients had disproportionately worse naming of body parts relative to other types of concrete entities. Taken together, these findings extend previous neuropsychological and functional neuroimaging studies of body part knowledge, and add to our growing understanding of the nuances of how different linguistic and conceptual categories are operated by left frontal and parietal structures. PMID:18608319

  9. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data.

    PubMed

    Putri, Fadhilah Kurnia; Song, Giltae; Kwon, Joonho; Rao, Praveen

    2017-09-25

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query ( DISPAQ ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data.

  10. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data †

    PubMed Central

    Putri, Fadhilah Kurnia; Song, Giltae; Rao, Praveen

    2017-01-01

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query (DISPAQ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation’s Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data. PMID:28946679

  11. Application of discriminative models for interactive query refinement in video retrieval

    NASA Astrophysics Data System (ADS)

    Srivastava, Amit; Khanwalkar, Saurabh; Kumar, Anoop

    2013-12-01

    The ability to quickly search for large volumes of videos for specific actions or events can provide a dramatic new capability to intelligence agencies. Example-based queries from video are a form of content-based information retrieval (CBIR) where the objective is to retrieve clips from a video corpus, or stream, using a representative query sample to find more like this. Often, the accuracy of video retrieval is largely limited by the gap between the available video descriptors and the underlying query concept, and such exemplar queries return many irrelevant results with relevant ones. In this paper, we present an Interactive Query Refinement (IQR) system which acts as a powerful tool to leverage human feedback and allow intelligence analyst to iteratively refine search queries for improved precision in the retrieved results. In our approach to IQR, we leverage discriminative models that operate on high dimensional features derived from low-level video descriptors in an iterative framework. Our IQR model solicits relevance feedback on examples selected from the region of uncertainty and updates the discriminating boundary to produce a relevance ranked results list. We achieved 358% relative improvement in Mean Average Precision (MAP) over initial retrieval list at a rank cutoff of 100 over 4 iterations. We compare our discriminative IQR model approach to a naïve IQR and show our model-based approach yields 49% relative improvement over the no model naïve system.

  12. Single mild traumatic brain injury results in transiently impaired spatial long-term memory and altered search strategies.

    PubMed

    Marschner, Linda; Schreurs, An; Lechat, Benoit; Mogensen, Jesper; Roebroek, Anton; Ahmed, Tariq; Balschun, Detlef

    2018-02-27

    Mild traumatic brain injury (mTBI) can lead to diffuse neurophysical damage as well as cognitive and affective alterations. The nature and extent of behavioral changes after mTBI are still poorly understood and how strong an impact force has to be to cause long-term behavioral changes is not yet known. Here, we examined spatial learning acquisition, retention and reversal in a Morris water maze, and assessed search strategies during task performance after a single, mild, closed-skull traumatic impact referred to as "minimal" TBI. Additionally, we investigated changes in conditioned learning in a contextual fear-conditioning paradigm. Results show transient deficits in spatial memory retention, which, although limited, are indicative of deficits in long-term memory reconsolidation. Interestingly, minimal TBI causes animals to relapse to less effective search strategies, affecting performance after a retention pause. Apart from cognitive deficits, results yielded a sub-acute, transient increase in freezing response after fear conditioning, with no increase in baseline behavior, an indication of a stronger affective reaction to aversive stimuli after minimal TBI or greater susceptibility to stress. Furthermore, western blot analysis showed a short-term increase in hippocampal GFAP expression, most likely indicating astrogliosis, which is typically related to injuries of the central nervous system. Our findings provide evidence that even a very mild impact to the skull can have detectable consequences on the molecular, cognitive and affective-like level. However, these effects seemed to be very transient and reversible. Copyright © 2018. Published by Elsevier B.V.

  13. Semantic querying of relational data for clinical intelligence: a semantic web services-based approach

    PubMed Central

    2013-01-01

    Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556

  14. Query-Driven Visualization and Analysis

    SciTech Connect

    Ruebel, Oliver; Bethel, E. Wes; Prabhat, Mr.; Wu, Kesheng

    2012-11-01

    This report focuses on an approach to high performance visualization and analysis, termed query-driven visualization and analysis (QDV). QDV aims to reduce the amount of data that needs to be processed by the visualization, analysis, and rendering pipelines. The goal of the data reduction process is to separate out data that is "scientifically interesting'' and to focus visualization, analysis, and rendering on that interesting subset. The premise is that for any given visualization or analysis task, the data subset of interest is much smaller than the larger, complete data set. This strategy---extracting smaller data subsets of interest and focusing of the visualization processing on these subsets---is complementary to the approach of increasing the capacity of the visualization, analysis, and rendering pipelines through parallelism. This report discusses the fundamental concepts in QDV, their relationship to different stages in the visualization and analysis pipelines, and presents QDV's application to problems in diverse areas, ranging from forensic cybersecurity to high energy physics.

  15. B.E.A.R. GeneInfo: A tool for identifying gene-related biomedical publications through user modifiable queries

    PubMed Central

    Zhou, Guohui; Wen, Xinyu; Liu, Hang; Schlicht, Michael J; Hessner, Martin J; Tonellato, Peter J; Datta, Milton W

    2004-01-01

    Background Once specific genes are identified through high throughput genomics technologies there is a need to sort the final gene list to a manageable size for validation studies. The triaging and sorting of genes often relies on the use of supplemental information related to gene structure, metabolic pathways, and chromosomal location. Yet in disease states where the genes may not have identifiable structural elements, poorly defined metabolic pathways, or limited chromosomal data, flexible systems for obtaining additional data are necessary. In these situations having a tool for searching the biomedical literature using the list of identified genes while simultaneously defining additional search terms would be useful. Results We have built a tool, BEAR GeneInfo, that allows flexible searches based on the investigators knowledge of the biological process, thus allowing for data mining that is specific to the scientist's strengths and interests. This tool allows a user to upload a series of GenBank accession numbers, Unigene Ids, Locuslink Ids, or gene names. BEAR GeneInfo takes these IDs and identifies the associated gene names, and uses the lists of gene names to query PubMed. The investigator can add additional modifying search terms to the query. The subsequent output provides a list of publications, along with the associated reference hyperlinks, for reviewing the identified articles for relevance and interest. An example of the use of this tool in the study of human prostate cancer cells treated with Selenium is presented. Conclusions This tool can be used to further define a list of genes that have been identified through genomic or genetic studies. Through the use of targeted searches with additional search terms the investigator can limit the list to genes that match their specific research interests or needs. The tool is freely available on the web at [1], and the authors will provide scripts and database components if requested mdatta@mcw.edu PMID

  16. A Frequency-based Technique to Improve the Spelling Suggestion Rank in Medical Queries

    PubMed Central

    Crowell, Jonathan; Zeng, Qing; Ngo, Long; Lacroix, Eve-Marie

    2004-01-01

    Objective: There is an abundance of health-related information online, and millions of consumers search for such information. Spell checking is of crucial importance in returning pertinent results, so the authors propose a technique for increasing the effectiveness of spell-checking tools used for health-related information retrieval. Design: A sample of incorrectly spelled medical terms was submitted to two different spell-checking tools, and the resulting suggestions, derived under two different dictionary configurations, were re-sorted according to how frequently each term appeared in log data from a medical search engine. Measurements: Univariable analysis was carried out to assess the effect of each factor (spell-checking tool, dictionary type, re-sort, or no re-sort) on the probability of success. The factors that were statistically significant in the univariable analysis were then used in multivariable analysis to evaluate the independent effect of each of the factors. Results: The re-sorted suggestions proved to be significantly more accurate than the original list returned by the spell-checking tool. The odds of finding the correct suggestion in the number one rank were increased by 63% after re-sorting using the authors' method. This effect was independent of both the dictionary and the spell-checking tools that were used. Conclusion: Using knowledge about the frequency of a given word's occurrence in the medical domain can significantly improve spelling correction for medical queries. PMID:14764616

  17. A frequency-based technique to improve the spelling suggestion rank in medical queries.

    PubMed

    Crowell, Jonathan; Zeng, Qing; Ngo, Long; Lacroix, Eve-Marie

    2004-01-01

    There is an abundance of health-related information online, and millions of consumers search for such information. Spell checking is of crucial importance in returning pertinent results, so the authors propose a technique for increasing the effectiveness of spell-checking tools used for health-related information retrieval. A sample of incorrectly spelled medical terms was submitted to two different spell-checking tools, and the resulting suggestions, derived under two different dictionary configurations, were re-sorted according to how frequently each term appeared in log data from a medical search engine. Univariable analysis was carried out to assess the effect of each factor (spell-checking tool, dictionary type, re-sort, or no re-sort) on the probability of success. The factors that were statistically significant in the univariable analysis were then used in multivariable analysis to evaluate the independent effect of each of the factors. The re-sorted suggestions proved to be significantly more accurate than the original list returned by the spell-checking tool. The odds of finding the correct suggestion in the number one rank were increased by 63% after re-sorting using the authors' method. This effect was independent of both the dictionary and the spell-checking tools that were used. Using knowledge about the frequency of a given word's occurrence in the medical domain can significantly improve spelling correction for medical queries.

  18. Query-answering algorithms for information agents

    SciTech Connect

    Levy, A.Y.; Rajaraman, A.; Ordille, J.J.

    1996-12-31

    We describe the architecture and query-answering algorithms used in the Information Manifold, an implemented information gathering system that provides uniform access to structured information sources on the World-Wide Web. Our architecture provides an expressive language for describing information sources, which makes it easy to add new sources and to model the fine-grained distinctions between their contents. The query-answering algorithm guarantees that the descriptions of the sources are exploited to access only sources that are relevant to a given query. Accessing only relevant sources is crucial to scale up such a system to large numbers of sources. In addition, our algorithm can exploit run-time information to further prune information sources and to reduce the cost of query planning.

  19. Superfund Chemical Data Matrix (SCDM) Query

    EPA Pesticide Factsheets

    This site allows you to to easily query the Superfund Chemical Data Matrix (SCDM) and generate a list of the corresponding Hazard Ranking System (HRS) factor values, benchmarks, and data elements that you need.

  20. Superfund Chemical Data Matrix (SCDM) Query - Popup

    EPA Pesticide Factsheets

    This site allows you to to easily query the Superfund Chemical Data Matrix (SCDM) and generate a list of the corresponding Hazardous Ranking System (HRS) factor values, benchmarks, and data elements that you need.

  1. An Ensemble Approach for Expanding Queries

    DTIC Science & Technology

    2012-11-01

    vincristine; thalidomide ; painful; cisplatin; oxaliplatin; charcot-marie-tooth disease; drugs; neuropathy Ensemble expansion child of, asthma, kids...disorders; peripheral nervous system disorders; bortezomib; vincristine; thalidomide ; peripheral nerve diseases We reformulate the query by appending

  2. A Query System Implementation Case Study.

    ERIC Educational Resources Information Center

    Hiser, Judith N.; Neil, M. Elizabeth

    1985-01-01

    The Department of Administrative Programming Services of Clemson University investigated products available in user-friendly retrieval systems. The test of INTELLECT, a natural language query system written by Artifical Intelligence Corporation, is described. (Author/MLW)

  3. Producing approximate answers to database queries

    NASA Technical Reports Server (NTRS)

    Vrbsky, Susan V.; Liu, Jane W. S.

    1993-01-01

    We have designed and implemented a query processor, called APPROXIMATE, that makes approximate answers available if part of the database is unavailable or if there is not enough time to produce an exact answer. The accuracy of the approximate answers produced improves monotonically with the amount of data retrieved to produce the result. The exact answer is produced if all of the needed data are available and query processing is allowed to continue until completion. The monotone query processing algorithm of APPROXIMATE works within the standard relational algebra framework and can be implemented on a relational database system with little change to the relational architecture. We describe here the approximation semantics of APPROXIMATE that serves as the basis for meaningful approximations of both set-valued and single-valued queries. We show how APPROXIMATE is implemented to make effective use of semantic information, provided by an object-oriented view of the database, and describe the additional overhead required by APPROXIMATE.

  4. The StarView intelligent query mechanism

    NASA Technical Reports Server (NTRS)

    Semmel, R. D.; Silberberg, D. P.

    1993-01-01

    The StarView interface is being developed to facilitate the retrieval of scientific and engineering data produced by the Hubble Space Telescope. While predefined screens in the interface can be used to specify many common requests, ad hoc requests require a dynamic query formulation capability. Unfortunately, logical level knowledge is too sparse to support this capability. In particular, essential formulation knowledge is lost when the domain of interest is mapped to a set of database relation schemas. Thus, a system known as QUICK has been developed that uses conceptual design knowledge to facilitate query formulation. By heuristically determining strongly associated objects at the conceptual level, QUICK is able to formulate semantically reasonable queries in response to high-level requests that specify only attributes of interest. Moreover, by exploiting constraint knowledge in the conceptual design, QUICK assures that queries are formulated quickly and will execute efficiently.

  5. Distributed Queries of Large Numerical Data Sets

    NASA Technical Reports Server (NTRS)

    Nemes, Richard M.

    1998-01-01

    We have extended a previously developed high-level data model, which combines numerical quantities and meta-data into a unified hybrid model, to distributed data. An elegant query language based on SQL is extended further to allow queries against such a distributed hybrid data base. The extension is realized by allowing statements in a non-SQL programming language to be embedded in SQL view definitions.

  6. The Study of Environment on Aboriginal Resilience and Child Health (SEARCH): a long-term platform for closing the gap.

    PubMed

    Wright, Darryl; Gordon, Raylene; Carr, Darren; Craig, Jonathan C; Banks, Emily; Muthayya, Sumithra; Wutzke, Sonia; Eades, Sandra J; Redman, Sally

    2016-07-15

    The full potential for research to improve Aboriginal health has not yet been realised. This paper describes an established long-term action partnership between Aboriginal Community Controlled Health Services (ACCHSs), the Aboriginal Health and Medical Research Council of New South Wales (AH&MRC), researchers and the Sax Institute, which is committed to using high-quality data to bring about health improvements through better services, policies and programs. The ACCHSs, in particular, have ensured that the driving purpose of the research conducted is to stimulate action to improve health for urban Aboriginal children and their families. This partnership established a cohort study of 1600 urban Aboriginal children and their caregivers, known as SEARCH (the Study of Environment on Aboriginal Resilience and Child Health), which is now having significant impacts on health, services and programs for urban Aboriginal children and their families. This paper describes some examples of the impacts of SEARCH, and reflects on the ways of working that have enabled these changes to occur, such as strong governance, a focus on improved health, AH&MRC and ACCHS leadership, and strategies to support the ACCHS use of data and to build Aboriginal capacity.

  7. Managing and Querying Whole Slide Images.

    PubMed

    Wang, Fusheng; Oh, Tae W; Vergara-Niedermayr, Cristobal; Kurc, Tahsin; Saltz, Joel

    2012-02-16

    High-resolution pathology images provide rich information about the morphological and functional characteristics of biological systems, and are transforming the field of pathology into a new era. To facilitate the use of digital pathology imaging for biomedical research and clinical diagnosis, it is essential to manage and query both whole slide images (WSI) and analytical results generated from images, such as annotations made by humans and computed features and classifications made by computer algorithms. There are unique requirements on modeling, managing and querying whole slide images, including compatibility with standards, scalability, support of image queries at multiple granularities, and support of integrated queries between images and derived results from the images. In this paper, we present our work on developing the Pathology Image Database System (PIDB), which is a standard oriented image database to support retrieval of images, tiles, regions and analytical results, image visualization and experiment management through a unified interface and architecture. The system is deployed for managing and querying whole slide images for In Silico brain tumor studies at Emory University. PIDB is generic and open source, and can be easily used to support other biomedical research projects. It has the potential to be integrated into a Picture Archiving and Communications System (PACS) with powerful query capabilities to support pathology imaging.

  8. Managing and Querying Image Annotation and Markup in XML

    PubMed Central

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid. PMID:21218167

  9. Managing and Querying Image Annotation and Markup in XML.

    PubMed

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.

  10. A Query Cache Tool for Optimizing Repeatable and Parallel OLAP Queries

    NASA Astrophysics Data System (ADS)

    Santos, Ricardo Jorge; Bernardino, Jorge

    On-line analytical processing against data warehouse databases is a common form of getting decision making information for almost every business field. Decision support information oftenly concerns periodic values based on regular attributes, such as sales amounts, percentages, most transactioned items, etc. This means that many similar OLAP instructions are periodically repeated, and simultaneously, between the several decision makers. Our Query Cache Tool takes advantage of previously executed queries, storing their results and the current state of the data which was accessed. Future queries only need to execute against the new data, inserted since the queries were last executed, and join these results with the previous ones. This makes query execution much faster, because we only need to process the most recent data. Our tool also minimizes the execution time and resource consumption for similar queries simultaneously executed by different users, putting the most recent ones on hold until the first finish and returns the results for all of them. The stored query results are held until they are considered outdated, then automatically erased. We present an experimental evaluation of our tool using a data warehouse based on a real-world business dataset and use a set of typical decision support queries to discuss the results, showing a very high gain in query execution time.

  11. System, method and apparatus for conducting a keyterm search

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W. (Inventor)

    2004-01-01

    A keyterm search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more keyterms. Next, a gleaning model of the query is created. The gleaning model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.

  12. System, Method and Apparatus for Conducting a Keyterm Search

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W. (Inventor)

    2004-01-01

    A keyterm search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more keyterms. Next, a gleaning model of the query is created. The gleaning model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.

  13. Symbolic representation and visual querying of left ventricular image sequences.

    PubMed

    Baroni, M; Del Bimbo, A; Evangelist, A; Vicario, E

    1999-01-01

    In the evaluation of regional left ventricular function, relevant cardiac disorders manifest themselves not only in static features, such as shape descriptors and motion excursion in end-diastolic and end-systolic frames, but also in their temporal evolution. In common diagnostic practice, such dynamic patterns are analysed by direct inspection of frame sequences through the use of a moviola. This permits only a subjective and poorly defined evaluation of functional parameters, and definitely prevents a systematic and reproducible analysis of large sets of reports. Retrieval by contents techniques may overcome this limitation by permitting the automatic comparison of the reports in a database against queries expressing descriptive properties related to significant pathological conditions. A system is presented which is aimed at investigating the potential of this approach by supporting retrieval by contents from a database of cineangiographic or echocardiographic images. The system relies on a symbolic description of both geometrical and temporal properties of left ventricular contours. This is derived automatically by an image processing and interpretation module and associated with the report at its storage time. In the retrieval stage, queries are expressed by means of an iconic visual language which describes searched content properties over a computer screen. The system automatically interprets iconic statements and compares them against concrete descriptions in the database. This enables medical users to interact with the system to search for motion and shape abnormalities on a regional basis, in single or homogeneous groups of reports, so as to enable both prospective and retrospective diagnosis.

  14. An SQL query generator for CLIPS

    NASA Technical Reports Server (NTRS)

    Snyder, James; Chirica, Laurian

    1990-01-01

    As expert systems become more widely used, their access to large amounts of external information becomes increasingly important. This information exists in several forms such as statistical, tabular data, knowledge gained by experts and large databases of information maintained by companies. Because many expert systems, including CLIPS, do not provide access to this external information, much of the usefulness of expert systems is left untapped. The scope of this paper is to describe a database extension for the CLIPS expert system shell. The current industry standard database language is SQL. Due to SQL standardization, large amounts of information stored on various computers, potentially at different locations, will be more easily accessible. Expert systems should be able to directly access these existing databases rather than requiring information to be re-entered into the expert system environment. The ORACLE relational database management system (RDBMS) was used to provide a database connection within the CLIPS environment. To facilitate relational database access a query generation system was developed as a CLIPS user function. The queries are entered in a CLlPS-like syntax and are passed to the query generator, which constructs and submits for execution, an SQL query to the ORACLE RDBMS. The query results are asserted as CLIPS facts. The query generator was developed primarily for use within the ICADS project (Intelligent Computer Aided Design System) currently being developed by the CAD Research Unit in the California Polytechnic State University (Cal Poly). In ICADS, there are several parallel or distributed expert systems accessing a common knowledge base of facts. Expert system has a narrow domain of interest and therefore needs only certain portions of the information. The query generator provides a common method of accessing this information and allows the expert system to specify what data is needed without specifying how to retrieve it.

  15. Spatial-symbolic Query Engine in Anatomy.

    PubMed

    Puget, A; Mejino, J L V; Detwiler, L T; Franklin, J D; Brinkley, J F

    2012-01-01

    Currently, the primary means for answering anatomical questions such as 'what vital organs would potentially be impacted by a bullet wound to the abdomen?' is to look them up in textbooks or to browse online sources. In this work we describe a semantic web service and spatial query processor that permits a user to graphically pose such questions as joined queries over separately defined spatial and symbolic knowledge sources. Spatial relations (e.g. anterior) were defined by two anatomy experts, and based on a 3-D volume of labeled images of the thorax, all the labeled anatomical structures were queried to retrieve the target structures for every query structure and every spatial relation. A web user interface and a web service were designed to relate existing symbolic information from the Foundational Model of Anatomy ontology (FMA) with spatial information provided by the spatial query processor, and to permit users to select anatomical structures and define queries. We evaluated the accuracy of results returned by the queries, and since there is no independent gold standard, we used two anatomy experts' opinions as the gold standard for comparison. We asked the same experts to define the gold standard and to define the spatial relations. The F-measure for the overall evaluation is 0.90 for rater 1 and 0.56 for rater 2. The percentage of observed agreement is 99% and Cohen's kappa coefficient reaches 0.51. The main source of disagreement relates to issues with the labels used in the dataset, and not with the tool itself. In its current state the system can be used as an end-user application but it is likely to be of most use as a framework for building end-user applications such as displaying the results as a 3-D anatomical scene. The system promises potential practical utility for obtaining and navigating spatial and symbolic data.

  16. An intelligent method for geographic Web search

    NASA Astrophysics Data System (ADS)

    Mei, Kun; Yuan, Ying

    2008-10-01

    While the electronically available information in the World-Wide Web is explosively growing and thus increasing, the difficulty to find relevant information is also increasing for search engine user. In this paper we discuss how to constrain web queries geographically. A number of search queries are associated with geographical locations, either explicitly or implicitly. Accurately and effectively detecting the locations where search queries are truly about has huge potential impact on increasing search relevance, bringing better targeted search results, and improving search user satisfaction. Our approach focus on both in the way geographic information is extracted from the web and, as far as we can tell, in the way it is integrated into query processing. This paper gives an overview of a spatially aware search engine for semantic querying of web document. It also illustrates algorithms for extracting location from web documents and query requests using the location ontologies to encode and reason about formal semantics of geographic web search. Based on a real-world scenario of tourism guide search, the application of our approach shows that the geographic information retrieval can be efficiently supported.

  17. Exploring personalized searches using tag-based user profiles and resource profiles in folksonomy.

    PubMed

    Cai, Yi; Li, Qing; Xie, Haoran; Min, Huaqin

    2014-10-01

    With the increase in resource-sharing websites such as YouTube and Flickr, many shared resources have arisen on the Web. Personalized searches have become more important and challenging since users demand higher retrieval quality. To achieve this goal, personalized searches need to take users' personalized profiles and information needs into consideration. Collaborative tagging (also known as folksonomy) systems allow users to annotate resources with their own tags, which provides a simple but powerful way for organizing, retrieving and sharing different types of social resources. In this article, we examine the limitations of previous tag-based personalized searches. To handle these limitations, we propose a new method to model user profiles and resource profiles in collaborative tagging systems. We use a normalized term frequency to indicate the preference degree of a user on a tag. A novel search method using such profiles of users and resources is proposed to facilitate the desired personalization in resource searches. In our framework, instead of the keyword matching or similarity measurement used in previous works, the relevance measurement between a resource and a user query (termed the query relevance) is treated as a fuzzy satisfaction problem of a user's query requirements. We implement a prototype system called the Folksonomy-based Multimedia Retrieval System (FMRS). Experiments using the FMRS data set and the MovieLens data set show that our proposed method outperforms baseline methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  18. Spinning a Web Search.

    ERIC Educational Resources Information Center

    Lager, Mark A.

    This paper focuses on techniques for retrieval used in information sciences and in World Wide Web search engines. The purpose of reference service and information science is to provide useful information in response to a query. The two metrics of recall and precision serve to express information retrieval performance. There are two major…

  19. Is 'self-medication' a useful term to retrieve related publications in the literature? A systematic exploration of related terms.

    PubMed

    Mansouri, Ava; Sarayani, Amir; Ashouri, Asieh; Sherafatmand, Mona; Hadjibabaie, Molouk; Gholami, Kheirollah

    2015-01-01

    Self-Medication (SM), i.e. using medications to treat oneself, is a major concern for health researchers and policy makers. The terms "self medication" or "self-medication" (SM terms) have been used to explain various concepts while several terms have also been employed to define this practice. Hence, retrieving relevant publications would require exhaustive literature screening. So, we assessed the current situation of SM terms in the literature to improve the relevancy of search outcomes. In this Systematic exploration, SM terms were searched in the 6 following databases and publisher's portals till April 2012: Web of Science, Scopus, PubMed, Google scholar, ScienceDirect, and Wiley. A simple search query was used to include only publications with SM terms. We used Relative-Risk (RR) to estimate the probability of SM terms use in related compared to unrelated publications. Sensitivity and specificity of SM terms as keywords in search query were also calculated. Relevant terms to SM practice were extracted and their Likelihood Ratio positive and negative (LR+/-) were calculated to assess their effect on the probability of search outcomes relevancy in addition to previous search queries. We also evaluated the content of unrelated publications. All mentioned steps were performed in title (TI) and title or abstract (TIAB) of publications. 1999 related and 1917 unrelated publications were found. SM terms RR was 4.5 in TI and 2.1 in TIAB. SM terms sensitivity and specificity respectively were 55.4% and 87.7% in TI and 84.0% and 59.5% in TIAB. "OTC" and "Over-The-Counter Medication", with LR+ 16.78 and 16.30 respectively, provided the most conclusive increase in the probability of the relevancy of publications. The most common unrelated SM themes were self-medication hypothesis, drug abuse and Zoopharmacognosy. Due to relatively low specificity or sensitivity of SM terms, relevant terms should be employed in search queries and clear definitions of SM applications should

  20. The I4 Online Query Tool for Earth Observations Data

    NASA Technical Reports Server (NTRS)

    Stefanov, William L.; Vanderbloemen, Lisa A.; Lawrence, Samuel J.

    2015-01-01

    The NASA Earth Observation System Data and Information System (EOSDIS) delivers an average of 22 terabytes per day of data collected by orbital and airborne sensor systems to end users through an integrated online search environment (the Reverb/ECHO system). Earth observations data collected by sensors on the International Space Station (ISS) are not currently included in the EOSDIS system, and are only accessible through various individual online locations. This increases the effort required by end users to query multiple datasets, and limits the opportunity for data discovery and innovations in analysis. The Earth Science and Remote Sensing Unit of the Exploration Integration and Science Directorate at NASA Johnson Space Center has collaborated with the School of Earth and Space Exploration at Arizona State University (ASU) to develop the ISS Instrument Integration Implementation (I4) data query tool to provide end users a clean, simple online interface for querying both current and historical ISS Earth Observations data. The I4 interface is based on the Lunaserv and Lunaserv Global Explorer (LGE) open-source software packages developed at ASU for query of lunar datasets. In order to avoid mirroring existing databases - and the need to continually sync/update those mirrors - our design philosophy is for the I4 tool to be a pure query engine only. Once an end user identifies a specific scene or scenes of interest, I4 transparently takes the user to the appropriate online location to download the data. The tool consists of two public-facing web interfaces. The Map Tool provides a graphic geobrowser environment where the end user can navigate to an area of interest and select single or multiple datasets to query. The Map Tool displays active image footprints for the selected datasets (Figure 1). Selecting a footprint will open a pop-up window that includes a browse image and a link to available image metadata, along with a link to the online location to order or

  1. Optimizing healthcare research data warehouse design through past COSTAR query analysis.

    PubMed

    Murphy, S N; Morgan, M M; Barnett, G O; Chueh, H C

    1999-01-01

    Over the past two years we have reviewed and implemented the specifications for a large relational database (a data warehouse) to find research cohorts from data similar to that contained within the clinical COSTAR database at the Massachusetts General Hospital. A review of 16 years of COSTAR research queries was conducted to determine the most common search strategies. These search strategies are relevant to the general research community, because they use the Medical Query Language (MQL) developed for the COSTAR M database which is extremely flexible (much more so than SQL) and allows searches by coded fields, text reports, and laboratory values in a completely ad hoc fashion. By reviewing these search strategies, we were able to obtain user specifications for a research oriented healthcare data warehouse that could support 90% of the queries. The data warehouse was implemented in a relational database using the star schema, allowing for highly optimized analytical processing. This allowed queries that performed slowly in the M database to be performed very rapidly in the relational database. It also allowed the data warehouse to scale effectively.

  2. Predictive and Personalized Drug Query System.

    PubMed

    Khemmarat, Samamon; Gao, Lixin

    2017-07-01

    Several factors need to be carefully considered in using pharmaceutical drugs, such as drug interactions, side effects, and contraindications. To further complicate the matter, the presence of some drug properties, such as side effects, depends on patient characteristics, such as age, gender, and genetic profiles. Our goal is to provide a tool to assist medical professionals and drug consumers in choosing and finding drugs that suit their needs. We develop an approach that allows querying for drugs that satisfy a set of conditions. The approach can tailor the answers based on given patient profiles. Considering the noisiness and incompleteness of publicly available drug data, in contrast to traditional query systems, our approach considers both the answers that exactly match the query and those that closely match the query. We represent drug information as a heterogeneous graph and model answering a query as a subgraph matching problem. To rank answers, our approach leverages the structure and the heterogeneity of the drug graph to quantify the likelihood of edges and score the answers. Our evaluation shows that for quantifying the edge likelihood, our network-based approach can improve the area under receiver operating characteristic by up to 18%, comparing to a baseline approach. We develop a prototype of our system and demonstrate its benefits through several examples.

  3. Merging Ontology Navigation with Query Construction for Web-based Medicare Data Exploration.

    PubMed

    Zhang, Guo-Qiang; Cui, Licong; Teagno, Joe; Kaebler, David; Koroukian, Siran; Xu, Rong

    2013-01-01

    To enhance web-based exploration of Medicare data, we present a unique query interface merging ontology navigation with query construction, for cohort discovery based on demographics, disease classification codes, medication and other types of clinical data. Our interface seamlessly blends query construction with functions for hierarchical browsing and rendering of terms and associated codes from vocabulary systems and ontologies, such as International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). By unifying ontology navigation activities with query widget generation, a user can perform fine-tuned full boolean queries based on the substructure of the ontology, with flexibility to enable or disable subsumption-based queries. Query performance were evaluated on top disease subtypes of Centers for Medicare and Medicaid Services data, consisting of 5% of 2009 Limited Data Set files (inpatient and outpatient). Such interfaces will help moving the data access paradigm from a hypothesis-driven style to a data-driven one, while improving efficiency as a collective "secondary-use user community."

  4. EHR query language (EQL)--a query language for archetype-based health records.

    PubMed

    Ma, Chunlan; Frankel, Heath; Beale, Thomas; Heard, Sam

    2007-01-01

    OpenEHR specifications have been developed to standardise the representation of an international electronic health record (EHR). The language used for querying EHR data is not as yet part of the specification. To fill in this gap, Ocean Informatics has developed a query language currently known as EHR Query Language (EQL), a declarative language supporting queries on EHR data. EQL is neutral to EHR systems, programming languages and system environments and depends only on the openEHR archetype model and semantics. Thus, in principle, EQL can be used in any archetype-based computational context. In the EHR context described here, particular queries mention concepts from the openEHR EHR Reference Model (RM). EQL can be used as a common query language for disparate archetype-based applications. The use of a common RM, archetypes, and a companion query language, such as EQL, semantic interoperability of EHR information is much closer. This paper introduces the EQL syntax and provides example clinical queries to illustrate the syntax. Finally, current implementations and future directions are outlined.

  5. Achieve Location Privacy-Preserving Range Query in Vehicular Sensing

    PubMed Central

    Lu, Rongxing; Ma, Maode; Bao, Haiyong

    2017-01-01

    Modern vehicles are equipped with a plethora of on-board sensors and large on-board storage, which enables them to gather and store various local-relevant data. However, the wide application of vehicular sensing has its own challenges, among which location-privacy preservation and data query accuracy are two critical problems. In this paper, we propose a novel range query scheme, which helps the data requester to accurately retrieve the sensed data from the distributive on-board storage in vehicular ad hoc networks (VANETs) with location privacy preservation. The proposed scheme exploits structured scalars to denote the locations of data requesters and vehicles, and achieves the privacy-preserving location matching with the homomorphic Paillier cryptosystem technique. Detailed security analysis shows that the proposed range query scheme can successfully preserve the location privacy of the involved data requesters and vehicles, and protect the confidentiality of the sensed data. In addition, performance evaluations are conducted to show the efficiency of the proposed scheme, in terms of computation delay and communication overhead. Specifically, the computation delay and communication overhead are not dependent on the length of the scalar, and they are only proportional to the number of vehicles. PMID:28786943

  6. Achieve Location Privacy-Preserving Range Query in Vehicular Sensing.

    PubMed

    Kong, Qinglei; Lu, Rongxing; Ma, Maode; Bao, Haiyong

    2017-08-08

    Modern vehicles are equipped with a plethora of on-board sensors and large on-board storage, which enables them to gather and store various local-relevant data. However, the wide application of vehicular sensing has its own challenges, among which location-privacy preservation and data query accuracy are two critical problems. In this paper, we propose a novel range query scheme, which helps the data requester to accurately retrieve the sensed data from the distributive on-board storage in vehicular ad hoc networks (VANETs) with location privacy preservation. The proposed scheme exploits structured scalars to denote the locations of data requesters and vehicles, and achieves the privacy-preserving location matching with the homomorphic Paillier cryptosystem technique. Detailed security analysis shows that the proposed range query scheme can successfully preserve the location privacy of the involved data requesters and vehicles, and protect the confidentiality of the sensed data. In addition, performance evaluations are conducted to show the efficiency of the proposed scheme, in terms of computation delay and communication overhead. Specifically, the computation delay and communication overhead are not dependent on the length of the scalar, and they are only proportional to the number of vehicles.

  7. Bio-TDS: bioscience query tool discovery system.

    PubMed

    Gnimpieba, Etienne Z; VanDiermen, Menno S; Gustafson, Shayla M; Conn, Bill; Lushbough, Carol M

    2017-01-04

    Bioinformatics and computational biology play a critical role in bioscience and biomedical research. As researchers design their experimental projects, one major challenge is to find the most relevant bioinformatics toolkits that will lead to new knowledge discovery from their data. The Bio-TDS (Bioscience Query Tool Discovery Systems, http://biotds.org/) has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains (e.g. genomic, proteomic, bio-imaging) the ability to query over 12 000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS's scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on BIOLOGICAL DATA ANALYSIS: The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researchers the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. MICA: desktop software for comprehensive searching of DNA databases

    PubMed Central

    Stokes, William A; Glick, Benjamin S

    2006-01-01

    Background Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. Results MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. Conclusion MICA is suitable as a search engine for desktop DNA analysis software. PMID:17018144

  9. MICA: desktop software for comprehensive searching of DNA databases.

    PubMed

    Stokes, William A; Glick, Benjamin S

    2006-10-03

    Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. MICA is suitable as a search engine for desktop DNA analysis software.

  10. Distributed search engine architecture based on topic specific searches

    NASA Astrophysics Data System (ADS)

    Abudaqqa, Yousra; Patel, Ahmed

    2015-05-01

    Indisputably, search engines (SEs) abound. The monumental growth of users performing online searches on the Web is a contending issue in the contemporary world nowadays. For example, there are tens of billions of searches performed everyday, which typically offer the users many irrelevant results which are time consuming and costly to the user. Based on the afore-going problem it has become a herculean task for existing Web SEs to provide complete, relevant and up-to-date information response to users' search queries. To overcome this problem, we developed the Distributed Search Engine Architecture (DSEA), which is a new means of smart information query and retrieval of the World Wide Web (WWW). In DSEAs, multiple autonomous search engines, owned by different organizations or individuals, cooperate and act as a single search engine. This paper includes the work reported in this research focusing on development of DSEA, based on topic-specific specialised search engines. In DSEA, the results to specific queries could be provided by any of the participating search engines, for which the user is unaware of. The important design goal of using topic-specific search engines in the research is to build systems that can effectively be used by larger number of users simultaneously. Efficient and effective usage with good response is important, because it involves leveraging the vast amount of searched data from the World Wide Web, by categorising it into condensed focused topic -specific results that meet the user's queries. This design model and the development of the DSEA adopt a Service Directory (SD) to route queries towards topic-specific document hosting SEs. It displays the most acceptable performance which is consistent with the requirements of the users. The evaluation results of the model return a very high priority score which is associated with each frequency of a keyword.

  11. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining.

    PubMed

    Sadesh, S; Suganthe, R C

    2015-01-01

    Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio.

  12. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining

    PubMed Central

    Sadesh, S.; Suganthe, R. C.

    2015-01-01

    Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio. PMID:26221626

  13. ChronoQuery: visual modelling of temporal queries for real-time decision support.

    PubMed

    Majeed, Raphael W; Stöhr, Mark R; Brenner, Thorsten; Röhrig, Rainer

    2014-01-01

    Clinical decision support systems are an important aspect of medical informatics. The increasing amount of available patient data requires physicians to rely on information technology for research and during their day by day work. In intensive care medicine, fast actions are especially important. One major step towards enabling direct interaction of medical staff with patient data was the development of clinical data repositories with easy query frontends. While clinical data repositories can be extended for the use of real-time data, the corresponding query frontends do not support the time concepts necessary for real-time queries and decision support. Aim of this project is the development of a user interface to give physicians visual understanding of propositional logic combined with time concepts. Thus, physicians should be able formulate simple time based queries on their own--and validate and quality check complex queries created by medical informatics experts.

  14. Process query systems for network security monitoring

    NASA Astrophysics Data System (ADS)

    Berk, Vincent; Fox, Naomi

    2005-05-01

    In this paper we present the architecture of our network security monitoring infrastructure based on a Process Query System (PQS). PQS offers a new and powerful way of efficiently processing data streams, based on process descriptions that are submitted as queries. In this case the data streams are familiar network sensors, such as Snort, Netfilter, and Tripwire. The process queries describe the dynamics of network attacks and failures, such as worms, multistage attacks, and router failures. Using PQS the task of monitoring enterprise class networks is simplified, offering a priority-based GUI to the security administrator that clearly outlines events that require immediate attention. The PQS-Net system is deployed on an unsecured production network; the system has successfully detected many diverse attacks and failures.

  15. Automatic building information model query generation

    SciTech Connect

    Jiang, Yufei; Yu, Nan; Ming, Jiang; Lee, Sanghoon; DeGraw, Jason; Yen, John; Messner, John I.; Wu, Dinghao

    2015-12-01

    Energy efficient building design and construction calls for extensive collaboration between different subfields of the Architecture, Engineering and Construction (AEC) community. Performing building design and construction engineering raises challenges on data integration and software interoperability. Using Building Information Modeling (BIM) data hub to host and integrate building models is a promising solution to address those challenges, which can ease building design information management. However, the partial model query mechanism of current BIM data hub collaboration model has several limitations, which prevents designers and engineers to take advantage of BIM. To address this problem, we propose a general and effective approach to generate query code based on a Model View Definition (MVD). This approach is demonstrated through a software prototype called QueryGenerator. In conclusion, by demonstrating a case study using multi-zone air flow analysis, we show how our approach and tool can help domain experts to use BIM to drive building design with less labour and lower overhead cost.

  16. Inductive Querying with Virtual Mining Views

    NASA Astrophysics Data System (ADS)

    Blockeel, Hendrik; Calders, Toon; Fromont, Élisa; Prado, Adriana; Goethals, Bart; Robardet, Céline

    In an inductive database, one can not only query the data stored in the database, but also the patterns that are implicitly present in these data. In this chapter, we present an inductive database system in which the query language is traditional SQL. More specifically, we present a system in which the user can query the collection of all possible patterns as if they were stored in traditional relational tables. We show how such tables, or mining views, can be developed for three popular data mining tasks, namely itemset mining, association rule discovery and decision tree learning. To illustrate the interactive and iterative capabilities of our system, we describe a complete data mining scenario that consists in extracting knowledge from real gene expression data, after a pre-processing phase.

  17. Device-independent quantum private query

    NASA Astrophysics Data System (ADS)

    Maitra, Arpita; Paul, Goutam; Roy, Sarbani

    2017-04-01

    In quantum private query (QPQ), a client obtains values corresponding to his or her query only, and nothing else from the server, and the server does not get any information about the queries. V. Giovannetti et al. [Phys. Rev. Lett. 100, 230502 (2008)], 10.1103/PhysRevLett.100.230502 gave the first QPQ protocol and since then quite a few variants and extensions have been proposed. However, none of the existing protocols are device independent; i.e., all of them assume implicitly that the entangled states supplied to the client and the server are of a certain form. In this work, we exploit the idea of a local CHSH game and connect it with the scheme of Y. G. Yang et al. [Quantum Info. Process. 13, 805 (2014)], 10.1007/s11128-013-0692-8 to present the concept of a device-independent QPQ protocol.

  18. BioFed: federated query processing over life sciences linked open data.

    PubMed

    Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich

    2017-03-15

    Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the

  19. Published and perished? The influence of the searched protein database on the long-term storage of proteomics data.

    PubMed

    Griss, Johannes; Côté, Richard G; Gerner, Christopher; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2011-09-01

    In proteomics, protein identifications are reported and stored using an unstable reference system: protein identifiers. These proprietary identifiers are created individually by every protein database and can change or may even be deleted over time. To estimate the effect of the searched protein sequence database on the long-term storage of proteomics data we analyzed the changes of reported protein identifiers from all public experiments in the Proteomics Identifications (PRIDE) database by November 2010. To map the submitted protein identifier to a currently active entry, two distinct approaches were used. The first approach used the Protein Identifier Cross Referencing (PICR) service at the EBI, which maps protein identifiers based on 100% sequence identity. The second one (called logical mapping algorithm) accessed the source databases and retrieved the current status of the reported identifier. Our analysis showed the differences between the main protein databases (International Protein Index (IPI), UniProt Knowledgebase (UniProtKB), National Center for Biotechnological Information nr database (NCBI nr), and Ensembl) in respect to identifier stability. For example, whereas 20% of submitted IPI entries were deleted after two years, virtually all UniProtKB entries remained either active or replaced. Furthermore, the two mapping algorithms produced markedly different results. For example, the PICR service reported 10% more IPI entries deleted compared with the logical mapping algorithm. We found several cases where experiments contained more than 10% deleted identifiers already at the time of publication. We also assessed the proportion of peptide identifications in these data sets that still fitted the originally identified protein sequences. Finally, we performed the same overall analysis on all records from IPI, Ensembl, and UniProtKB: two releases per year were used, from 2005. This analysis showed for the first time the true effect of changing protein

  20. WRAPIN: new generation health search engine using UMLS knowledge sources for MeSH term extraction from health documentation.

    PubMed

    Gaudinat, Arnaud; Joubert, Michel; Aymard, Sylvain; Falco, Laurent; Boyer, Célia; Fieschi, Marius

    2004-01-01

    To realize the potential of the Internet as a source of valuable healthcare information, for the general public, patients or practitioners, it is imperative to establish a validation system based on standards of quality. The WRAPIN project (World-wide online Reliable Advice to Patients and Individuals) from the European Community has this ambitious goal. WRAPIN is a federating system for medical information with an editorial policy of intelligently sharing quality and professional information. The WRAPIN project has two main axes: the efficient and intelligent search of information and the assertion of the trustworthiness of content. This article presents the scientific challenges involved in extracting the knowledge from text-based information in order to better manage the knowledge and the rest of the retrieval proc-ess. Our innovative approach is to efficiently extract MeSH terms from the analyzed documents exploiting UMLS knowledge sources. A benefit has been measured when comparing extraction results. Even if the evaluation is made with a limited corpus, this research work proposes heuristics that can be validated to the whole biomedical domain, and possibly enhanced by the adjunction of other methods.

  1. Language model: Extension to solve inconsistency, incompleteness, and short query in cultural heritage collection

    NASA Astrophysics Data System (ADS)

    Tan, Kian Lam; Lim, Chen Kim

    2017-10-01

    With the explosive growth of online information such as email messages, news articles, and scientific literature, many institutions and museums are converting their cultural collections from physical data to digital format. However, this conversion resulted in the issues of inconsistency and incompleteness. Besides, the usage of inaccurate keywords also resulted in short query problem. Most of the time, the inconsistency and incompleteness are caused by the aggregation fault in annotating a document itself while the short query problem is caused by naive user who has prior knowledge and experience in cultural heritage domain. In this paper, we presented an approach to solve the problem of inconsistency, incompleteness and short query by incorporating the Term Similarity Matrix into the Language Model. Our approach is tested on the Cultural Heritage in CLEF (CHiC) collection which consists of short queries and documents. The results show that the proposed approach is effective and has improved the accuracy in retrieval time.

  2. An advanced web query interface for biological databases.

    PubMed

    Latendresse, Mario; Karp, Peter D

    2010-07-06

    Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects--that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs.

  3. An advanced web query interface for biological databases

    PubMed Central

    Latendresse, Mario; Karp, Peter D.

    2010-01-01

    Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715

  4. OpenSearch technology for geospatial resources discovery

    NASA Astrophysics Data System (ADS)

    Papeschi, Fabrizio; Enrico, Boldrini; Mazzetti, Paolo

    2010-05-01

    In 2005, the term Web 2.0 has been coined by Tim O'Reilly to describe a quickly growing set of Web-based applications that share a common philosophy of "mutually maximizing collective intelligence and added value for each participant by formalized and dynamic information sharing". Around this same period, OpenSearch a new Web 2.0 technology, was developed. More properly, OpenSearch is a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format. Due to its strong impact on the way the Web is perceived by users and also due its relevance for businesses, Web 2.0 has attracted the attention of both mass media and the scientific community. This explosive growth in popularity of Web 2.0 technologies like OpenSearch, and practical applications of Service Oriented Architecture (SOA) resulted in an increased interest in similarities, convergence, and a potential synergy of these two concepts. SOA is considered as the philosophy of encapsulating application logic in services with a uniformly defined interface and making these publicly available via discovery mechanisms. Service consumers may then retrieve these services, compose and use them according to their current needs. A great degree of similarity between SOA and Web 2.0 may be leading to a convergence between the two paradigms. They also expose divergent elements, such as the Web 2.0 support to the human interaction in opposition to the typical SOA machine-to-machine interaction. According to these considerations, the Geospatial Information (GI) domain, is also moving first steps towards a new approach of data publishing and discovering, in particular taking advantage of the OpenSearch technology. A specific GI niche is represented by the OGC Catalog Service for Web (CSW) that is part of the OGC Web Services (OWS) specifications suite, which provides a

  5. Evaluation of a Novel Syndromic Surveillance Query for Heat-Related Illness Using Hospital Data From Maricopa County, Arizona, 2015.

    PubMed

    White, Jessica R; Berisha, Vjollca; Lane, Kathryn; Ménager, Henri; Gettel, Aaron; Braun, Carol R

    We evaluated a novel syndromic surveillance query, developed by the Council of State and Territorial Epidemiologists (CSTE) Heat Syndrome Workgroup, for identifying heat-related illness cases in near real time, using emergency department and inpatient hospital data from Maricopa County, Arizona, in 2015. The Maricopa County Department of Public Health applied 2 queries for heat-related illness to area hospital data transmitted to the National Syndromic Surveillance Program BioSense Platform: the BioSense "heat, excessive" query and the novel CSTE query. We reviewed the line lists generated by each query and used the diagnosis code and chief complaint text fields to find probable cases of heat-related illness. For each query, we calculated positive predictive values (PPVs) for heat-related illness. The CSTE query identified 674 records, of which 591 were categorized as probable heat-related illness, demonstrating a PPV of 88% for heat-related illness. The BioSense query identified 791 patient records, of which 589 were probable heat-related illness, demonstrating a PPV of 74% for heat-related illness. The PPV was substantially higher for the CSTE novel and BioSense queries during the heat season (May 1 to September 30; 92% and 85%, respectively) than during the cooler seasons (55% and 29%, respectively). A novel query for heat-related illness that combined diagnosis codes, chief complaint text terms, and exclusion criteria had a high PPV for heat-related illness, particularly during the heat season. Public health departments can use this query to meet local needs; however, use of this novel query to substantially improve public health heat-related illness prevention remains to be seen.

  6. Secure searching of biomarkers through hybrid homomorphic encryption scheme.

    PubMed

    Kim, Miran; Song, Yongsoo; Cheon, Jung Hee

    2017-07-26

    As genome sequencing technology develops rapidly, there has lately been an increasing need to keep genomic data secure even when stored in the cloud and still used for research. We are interested in designing a protocol for the secure outsourcing matching problem on encrypted data. We propose an efficient method to securely search a matching position with the query data and extract some information at the position. After decryption, only a small amount of comparisons with the query information should be performed in plaintext state. We apply this method to find a set of biomarkers in encrypted genomes. The important feature of our method is to encode a genomic database as a single element of polynomial ring. Since our method requires a single homomorphic multiplication of hybrid scheme for query computation, it has the advantage over the previous methods in parameter size, computation complexity, and communication cost. In particular, the extraction procedure not only prevents leakage of database information that has not been queried by user but also reduces the communication cost by half. We evaluate the performance of our method and verify that the computation on large-scale personal data can be securely and practically outsourced to a cloud environment during data analysis. It takes about 3.9 s to search-and-extract the reference and alternate sequences at the queried position in a database of size 4M. Our solution for finding a set of biomarkers in DNA sequences shows the progress of cryptographic techniques in terms of their capability can support real-world genome data analysis in a cloud environment.

  7. IView: Introgression library visualization and query tool

    USDA-ARS?s Scientific Manuscript database

    Near-isogenic lines (NIL) are powerful genetic resources to analyze phenotypic variation and are important to map-base clone genes underlying mutations and traits. With many thousands of distinct genotypes, querying introgression libraries for lines of interest is an issue. We have created a tool ...

  8. DTI data modeling for unlimited query support

    NASA Astrophysics Data System (ADS)

    Siadat, Mohammad-Reza; Hammad, Rafat; Shetty, Anil; Soltanian-Zadeh, Hamid; Sethi, Ishwar K.; Eetemadi, Ameen; Elisevich, Kost V.

    2009-02-01

    This paper describes Data Modeling for unstructured data of Diffusion Tensor Imaging (DTI). Data Modeling is an essential first step for data preparation in any data management and data mining procedure. Conventional Entity- Relational (E-R) data modeling is lossy, irreproducible, and time-consuming especially when dealing with unstructured image data associated with complex systems like the human brain. We propose a methodological framework for more objective E-R data modeling with unlimited query support by eliminating the structured content-dependent metadata associated with the unstructured data. The proposed method is applied to DTI data and a minimum system is implemented accordingly. Eventually supported with navigation, data fusion, and feature extraction modules, the proposed system provides a content-based support environment (C-BASE). Such an environment facilitates an unlimited query support with a reproducible and efficient database schema. Switching between different modalities of data, while confining the feature extractors within the object(s) of interest, we supply anatomically specific query results. The price of such a scheme is relatively large storage and in some cases high computational cost. The data modeling and its mathematical framework, behind the scene of query executions and the user interface of the system are presented in this paper.

  9. Autonomic care platform for optimizing query performance

    PubMed Central

    2013-01-01

    Background As the amount of information in electronic health care systems increases, data operations get more complicated and time-consuming. Intensive Care platforms require a timely processing of data retrievals to guarantee the continuous display of recent data of patients. Physicians and nurses rely on this data for their decision making. Manual optimization of query executions has become difficult to handle due to the increased amount of queries across multiple sources. Hence, a more automated management is necessary to increase the performance of database queries. The autonomic computing paradigm promises an approach in which the system adapts itself and acts as self-managing entity, thereby limiting human interventions and taking actions. Despite the usage of autonomic control loops in network and software systems, this approach has not been applied so far for health information systems. Methods We extend the COSARA architecture, an infection surveillance and antibiotic management service platform for the Intensive Care Unit (ICU), with self-managed components to increase the performance of data retrievals. We used real-life ICU COSARA queries to analyse slow performance and measure the impact of optimizations. Each day more than 2 million COSARA queries are executed. Three control loops, which monitor the executions and take action, have been proposed: reactive, deliberative and reflective control loops. We focus on improvements of the execution time of microbiology queries directly related to the visual displays of patients’ data on the bedside screens. Results The results show that autonomic control loops are beneficial for the optimizations in the data executions in the ICU. The application of reactive control loop results in a reduction of 8.61% of the average execution time of microbiology results. The combined application of the reactive and deliberative control loop results in an average query time reduction of 10.92% and the combination of

  10. An alternative database approach for management of SNOMED CT and improved patient data queries.

    PubMed

    Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R

    2015-10-01

    statistics generated using the graph database were identical to those using validated methods. Patient queries produced identical patient count results to the Oracle RDBMS with comparable times. Database queries involving defining attributes of SNOMED CT concepts were possible with the graph DB. The same queries could not be directly performed with the Oracle RDBMS representation of the patient data and required the creation and use of external terminology services. Further, queries of undefined depth were successful in identifying unknown relationships between patient cohorts. The results of this study supported the hypothesis that a patient database built upon and around the semantic model of SNOMED CT was possible. The model supported queries that leveraged all aspects of the SNOMED CT logical model to produce clinically relevant query results. Logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. As medical terminologies, such as SNOMED CT, continue to expand, they will become more complex and model consistency will be more difficult to assure. Simultaneously, consumers of data will increasingly demand improvements to query functionality to accommodate additional granularity of clinical concepts without sacrificing speed. This new line of research provides an alternative approach to instantiating and querying patient data represented using advanced computable clinical terminologies. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Optimizing Online Suicide Prevention: A Search Engine-Based Tailored Approach.

    PubMed

    Arendt, Florian; Scherr, Sebastian

    2017-11-01

    Search engines are increasingly used to seek suicide-related information online, which can serve both harmful and helpful purposes. Google acknowledges this fact and presents a suicide-prevention result for particular search terms. Unfortunately, the result is only presented to a limited number of visitors. Hence, Google is missing the opportunity to provide help to vulnerable people. We propose a two-step approach to a tailored optimization: First, research will identify the risk factors. Second, search engines will reweight algorithms according to the risk factors. In this study, we show that the query share of the search term "poisoning" on Google shows substantial peaks corresponding to peaks in actual suicidal behavior. Accordingly, thresholds for showing the suicide-prevention result should be set to the lowest levels during the spring, on Sundays and Mondays, on New Year's Day, and on Saturdays following Thanksgiving. Search engines can help to save lives globally by utilizing a more tailored approach to suicide prevention.

  12. Adaptive search in mobile peer-to-peer databases

    NASA Technical Reports Server (NTRS)

    Wolfson, Ouri (Inventor); Xu, Bo (Inventor)

    2010-01-01

    Information is stored in a plurality of mobile peers. The peers communicate in a peer to peer fashion, using a short-range wireless network. Occasionally, a peer initiates a search for information in the peer to peer network by issuing a query. Queries and pieces of information, called reports, are transmitted among peers that are within a transmission range. For each search additional peers are utilized, wherein these additional peers search and relay information on behalf of the originator of the search.

  13. FACTA: a text search engine for finding associated biomedical concepts.

    PubMed

    Tsuruoka, Yoshimasa; Tsujii, Jun'ichi; Ananiadou, Sophia

    2008-11-01

    FACTA is a text search engine for MEDLINE abstracts, which is designed particularly to help users browse biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds) appearing in the documents retrieved by the query. The concepts are presented to the user in a tabular format and ranked based on the co-occurrence statistics. Unlike existing systems that provide similar functionality, FACTA pre-indexes not only the words but also the concepts mentioned in the documents, which enables the user to issue a flexible query (e.g. free keywords or Boolean combinations of keywords/concepts) and receive the results immediately even when the number of the documents that match the query is very large. The user can also view snippets from MEDLINE to get textual evidence of associations between the query terms and the concepts. The concept IDs and their names/synonyms for building the indexes were collected from several biomedical databases and thesauri, such as UniProt, BioThesaurus, UMLS, KEGG and DrugBank. The system is available at http://www.nactem.ac.uk/software/facta/

  14. Graphical modeling and query language for hospitals.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    So far there has been little evidence that implementation of the health information technologies (HIT) is leading to health care cost savings. One of the reasons for this lack of impact by the HIT likely lies in the complexity of the business process ownership in the hospitals. The goal of our research is to develop a business model-based method for hospital use which would allow doctors to retrieve directly the ad-hoc information from various hospital databases. We have developed a special domain-specific process modelling language called the MedMod. Formally, we define the MedMod language as a profile on UML Class diagrams, but we also demonstrate it on examples, where we explain the semantics of all its elements informally. Moreover, we have developed the Process Query Language (PQL) that is based on MedMod process definition language. The purpose of PQL is to allow a doctor querying (filtering) runtime data of hospital's processes described using MedMod. The MedMod language tries to overcome deficiencies in existing process modeling languages, allowing to specify the loosely-defined sequence of the steps to be performed in the clinical process. The main advantages of PQL are in two main areas - usability and efficiency. They are: 1) the view on data through "glasses" of familiar process, 2) the simple and easy-to-perceive means of setting filtering conditions require no more expertise than using spreadsheet applications, 3) the dynamic response to each step in construction of the complete query that shortens the learning curve greatly and reduces the error rate, and 4) the selected means of filtering and data retrieving allows to execute queries in O(n) time regarding the size of the dataset. We are about to continue developing this project with three further steps. First, we are planning to develop user-friendly graphical editors for the MedMod process modeling and query languages. The second step is to do evaluation of usability the proposed language and tool

  15. The Weaknesses of Full-Text Searching

    ERIC Educational Resources Information Center

    Beall, Jeffrey

    2008-01-01

    This paper provides a theoretical critique of the deficiencies of full-text searching in academic library databases. Because full-text searching relies on matching words in a search query with words in online resources, it is an inefficient method of finding information in a database. This matching fails to retrieve synonyms, and it also retrieves…

  16. Patterns of use and impact of standardised MedDRA query analyses on the safety evaluation and review of new drug and biologics license applications

    PubMed Central

    Chang, Lin-Chau; Mahmood, Riaz; Qureshi, Samina

    2017-01-01

    Purpose Standardised MedDRA Queries (SMQs) have been developed since the early 2000’s and used by academia, industry, public health, and government sectors for detecting safety signals in adverse event safety databases. The purpose of the present study is to characterize how SMQs are used and the impact in safety analyses for New Drug Application (NDA) and Biologics License Application (BLA) submissions to the United States Food and Drug Administration (USFDA). Methods We used the PharmaPendium database to capture SMQ use in Summary Basis of Approvals (SBoAs) of drugs and biologics approved by the USFDA. Characteristics of the drugs and the SMQ use were employed to evaluate the role of SMQ safety analyses in regulatory decisions and the veracity of signals they revealed. Results A comprehensive search of the SBoAs yielded 184 regulatory submissions approved from 2006 to 2015. Search strategies more frequently utilized restrictive searches with “narrow terms” to enhance specificity over strategies using “broad terms” to increase sensitivity, while some involved modification of search terms. A majority (59%) of 1290 searches used descriptive statistics, however inferential statistics were utilized in 35% of them. Commentary from reviewers and supervisory staff suggested that a small, yet notable percentage (18%) of 1290 searches supported regulatory decisions. The searches with regulatory impact were found in 73 submissions (40% of the submissions investigated). Most searches (75% of 227 searches) with regulatory implications described how the searches were confirmed, indicating prudence in the decision-making process. Conclusions SMQs have an increasing role in the presentation and review of safety analysis for NDAs/BLAs and their regulatory reviews. This study suggests that SMQs are best used for screening process, with descriptive statistics, description of SMQ modifications, and systematic verification of cases which is crucial for drawing regulatory

  17. Development of a Search Strategy for an Evidence Based Retrieval Service

    PubMed Central

    Ho, Gah Juan; Liew, Su May; Ng, Chirk Jenn; Hisham Shunmugam, Ranita; Glasziou, Paul

    2016-01-01

    Background Physicians are often encouraged to locate answers for their clinical queries via an evidence-based literature search approach. The methods used are often not clearly specified. Inappropriate search strategies, time constraint and contradictory information complicate evidence retrieval. Aims Our study aimed to develop a search strategy to answer clinical queries among physicians in a primary care setting Methods Six clinical questions of different medical conditions seen in primary care were formulated. A series of experimental searches to answer each question was conducted on 3 commonly advocated medical databases. We compared search results from a PICO (patients, intervention, comparison, outcome) framework for questions using different combinations of PICO elements. We also compared outcomes from doing searches using text words, Medical Subject Headings (MeSH), or a combination of both. All searches were documented using screenshots and saved search strategies. Results Answers to all 6 questions using the PICO framework were found. A higher number of systematic reviews were obtained using a 2 PICO element search compared to a 4 element search. A more optimal choice of search is a combination of both text words and MeSH terms. Despite searching using the Systematic Review filter, many non-systematic reviews or narrative reviews were found in PubMed. There was poor overlap between outcomes of searches using different databases. The duration of search and screening for the 6 questions ranged from 1 to 4 hours. Conclusion This strategy has been shown to be feasible and can provide evidence to doctors’ clinical questions. It has the potential to be incorporated into an interventional study to determine the impact of an online evidence retrieval system. PMID:27935993

  18. Development of a Search Strategy for an Evidence Based Retrieval Service.

    PubMed

    Ho, Gah Juan; Liew, Su May; Ng, Chirk Jenn; Hisham Shunmugam, Ranita; Glasziou, Paul

    2016-01-01

    Physicians are often encouraged to locate answers for their clinical queries via an evidence-based literature search approach. The methods used are often not clearly specified. Inappropriate search strategies, time constraint and contradictory information complicate evidence retrieval. Our study aimed to develop a search strategy to answer clinical queries among physicians in a primary care setting. Six clinical questions of different medical conditions seen in primary care were formulated. A series of experimental searches to answer each question was conducted on 3 commonly advocated medical databases. We compared search results from a PICO (patients, intervention, comparison, outcome) framework for questions using different combinations of PICO elements. We also compared outcomes from doing searches using text words, Medical Subject Headings (MeSH), or a combination of both. All searches were documented using screenshots and saved search strategies. Answers to all 6 questions using the PICO framework were found. A higher number of systematic reviews were obtained using a 2 PICO element search compared to a 4 element search. A more optimal choice of search is a combination of both text words and MeSH terms. Despite searching using the Systematic Review filter, many non-systematic reviews or narrative reviews were found in PubMed. There was poor overlap between outcomes of searches using different databases. The duration of search and screening for the 6 questions ranged from 1 to 4 hours. This strategy has been shown to be feasible and can provide evidence to doctors' clinical questions. It has the potential to be incorporated into an interventional study to determine the impact of an online evidence retrieval system.

  19. QueryOR: a comprehensive web platform for genetic variant analysis and prioritization.

    PubMed

    Bertoldi, Loris; Forcato, Claudio; Vitulo, Nicola; Birolo, Giovanni; De Pascale, Fabio; Feltrin, Erika; Schiavon, Riccardo; Anglani, Franca; Negrisolo, Susanna; Zanetti, Alessandra; D'Avanzo, Francesca; Tomanin, Rosella; Faulkner, Georgine; Vezzi, Alessandro; Valle, Giorgio

    2017-04-28

    Whole genome and exome sequencing are contributing to the extraordinary progress in the study of human genetic variants. In this fast developing field, appropriate and easily accessible tools are required to facilitate data analysis. Here we describe QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. Instead of being designed on specific datasets, it works on a general XML schema specifying formats and criteria of each data source. Thanks to this flexibility, new criteria can be easily added for future expansion. Currently, up to 70 user-selectable criteria are available, including a wide range of gene and variant features. Moreover, rather than progressively discarding variants taking one criterion at a time, the prioritization is achieved by a global positive selection process that considers all transcript isoforms, thus producing reliable results. QueryOR is easy to use and its intuitive interface allows to handle different kinds of inheritance as well as features related to sharing variants in different patients. QueryOR is suitable for investigating single patients, families or cohorts. QueryOR is a comprehensive and flexible web platform eligible for an easy user-driven variant prioritization. It is freely available for academic institutions at http://queryor.cribi.unipd.it/ .

  20. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more.

    PubMed

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-07-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more

    PubMed Central

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-01-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. PMID:25925572

  2. Is ‘Self-Medication’ a Useful Term to Retrieve Related Publications in the Literature? A Systematic Exploration of Related Terms

    PubMed Central

    Mansouri, Ava; Sarayani, Amir; Ashouri, Asieh; Sherafatmand, Mona; Hadjibabaie, Molouk; Gholami, Kheirollah

    2015-01-01

    Background Self-Medication (SM), i.e. using medications to treat oneself, is a major concern for health researchers and policy makers. The terms “self medication” or “self-medication” (SM terms) have been used to explain various concepts while several terms have also been employed to define this practice. Hence, retrieving relevant publications would require exhaustive literature screening. So, we assessed the current situation of SM terms in the literature to improve the relevancy of search outcomes. Methods In this Systematic exploration, SM terms were searched in the 6 following databases and publisher’s portals till April 2012: Web of Science, Scopus, PubMed, Google scholar, ScienceDirect, and Wiley. A simple search query was used to include only publications with SM terms. We used Relative-Risk (RR) to estimate the probability of SM terms use in related compared to unrelated publications. Sensitivity and specificity of SM terms as keywords in search query were also calculated. Relevant terms to SM practice were extracted and their Likelihood Ratio positive and negative (LR+/-) were calculated to assess their effect on the probability of search outcomes relevancy in addition to previous search queries. We also evaluated the content of unrelated publications. All mentioned steps were performed in title (TI) and title or abstract (TIAB) of publications. Results 1999 related and 1917 unrelated publications were found. SM terms RR was 4.5 in TI and 2.1 in TIAB. SM terms sensitivity and specificity respectively were 55.4% and 87.7% in TI and 84.0% and 59.5% in TIAB. “OTC” and “Over-The-Counter Medication”, with LR+ 16.78 and 16.30 respectively, provided the most conclusive increase in the probability of the relevancy of publications. The most common unrelated SM themes were self-medication hypothesis, drug abuse and Zoopharmacognosy. Conclusions Due to relatively low specificity or sensitivity of SM terms, relevant terms should be employed in

  3. Learning Database Abstractions for Query Reformation

    DTIC Science & Technology

    1993-04-30

    We have developed an efficient ’ reformulation algorithm [ Arens et al. 93, Hsu and Knoblock 93], which fires all applicable database abstractions...algorithm to reformulate queries to distributed databases [Hsu and Knoblock 93, Arens et al. 93]. We found that the reformulation approach can reduce the...improved by pruning irrelevant attributes. We can use the semantic knowledge of the databases provided by the SIMS system [ Arens and Knoblock 92, Arens

  4. Automatic building information model query generation

    DOE PAGES

    Jiang, Yufei; Yu, Nan; Ming, Jiang; ...

    2015-12-01

    Energy efficient building design and construction calls for extensive collaboration between different subfields of the Architecture, Engineering and Construction (AEC) community. Performing building design and construction engineering raises challenges on data integration and software interoperability. Using Building Information Modeling (BIM) data hub to host and integrate building models is a promising solution to address those challenges, which can ease building design information management. However, the partial model query mechanism of current BIM data hub collaboration model has several limitations, which prevents designers and engineers to take advantage of BIM. To address this problem, we propose a general and effective approachmore » to generate query code based on a Model View Definition (MVD). This approach is demonstrated through a software prototype called QueryGenerator. In conclusion, by demonstrating a case study using multi-zone air flow analysis, we show how our approach and tool can help domain experts to use BIM to drive building design with less labour and lower overhead cost.« less

  5. Performance of Point and Range Queries for In-memory Databases using Radix Trees on GPUs

    SciTech Connect

    Alam, Maksudul; Yoginath, Srikanth B; Perumalla, Kalyan S

    2016-01-01

    In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based adaptive radix tree (GRT) implementation over a variety of key distributions, synthetic benchmarks, and actual keys from music and book data sets. The performance is also compared with other index-searching schemes on the GPU. GRT on modern GPUs achieves some of the highest rates of index searches reported in the literature. For point queries, a throughput of up to 106 million and 130 million lookups per second is achieved for sparse and dense keys, respectively. For range queries, GRT yields 600 million and 1000 million lookups per second for sparse and dense keys, respectively, on a large dataset of 64 million 32-bit keys.

  6. Performance of Point and Range Queries for In-memory Databases using Radix Trees on GPUs

    SciTech Connect

    Alam, Maksudul; Yoginath, Srikanth B; Perumalla, Kalyan S

    2016-01-01

    In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based adaptive radix tree (GRT) implementation over a variety of key distributions, synthetic benchmarks, and actual keys from music and book data sets. The performance is also compared with other index-searching schemes on the GPU. GRT on modern GPUs achieves some of themore » highest rates of index searches reported in the literature. For point queries, a throughput of up to 106 million and 130 million lookups per second is achieved for sparse and dense keys, respectively. For range queries, GRT yields 600 million and 1000 million lookups per second for sparse and dense keys, respectively, on a large dataset of 64 million 32-bit keys.« less

  7. Robust hashing with local models for approximate similarity search.

    PubMed

    Song, Jingkuan; Yang, Yi; Li, Xuelong; Huang, Zi; Yang, Yang

    2014-07-01

    Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing l2,1 -norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.

  8. Improving average ranking precision in user searches for biomedical research datasets

    PubMed Central

    Gobeill, Julien; Gaudinat, Arnaud; Vachon, Thérèse; Ruch, Patrick

    2017-01-01

    Abstract Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorization method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries, and provided competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP, being +22.3% higher than the median infAP of the participant’s best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system’s performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. The similarity measure algorithm showed robust performance in different training conditions, with small performance variations compared to the Divergence from Randomness framework. Finally, the result categorization did not have significant impact on the system’s performance. We believe that our solution could be used to enhance biomedical dataset management systems. The use of data driven expansion methods, such as those based on word embeddings, could be an alternative to the complexity of biomedical terminologies. Nevertheless, due to the limited size of the assessment set, further experiments need to be performed to draw

  9. Applying Query Structuring in Cross-language Retrieval.

    ERIC Educational Resources Information Center

    Pirkola, Ari; Puolamaki, Deniz; Jarvelin, Kalervo

    2003-01-01

    Explores ways to apply query structuring in cross-language information retrieval. Tested were: English queries translated into Finnish using an electronic dictionary, and run in a Finnish newspaper databases; effects of compound-based structuring using a proximity operator for translation equivalents of query language compound components; and a…

  10. A Relational Algebra Query Language for Programming Relational Databases

    ERIC Educational Resources Information Center

    McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole

    2011-01-01

    In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…

  11. An Approach to Query Cost Modelling in Numeric Databases.

    ERIC Educational Resources Information Center

    Jarvelin, Kalervo

    1989-01-01

    Examines factors that determine user charges based on query processing costs in numeric databases, and analyzes the problem of estimating such charges in advance. An approach to query cost estimation is presented which is based on the relational data model and the query optimization, cardinality estimation, and file design techniques developed in…

  12. Unhappy with internal corporate search? : learn tips and tricks for building a controlled vocabulary ontology.

    SciTech Connect

    Arpin, Bettina Karin Schimanski; Jones, Brian S.; Bemesderfer, Joy; Ralph, Mark E.; Miller, Jennifer L

    2010-06-01

    Are your employees unhappy with internal corporate search? Frequent complaints include: too many results to sift through; results are unrelated/outdated; employees aren't sure which terms to search for. One way to improve intranet search is to implement a controlled vocabulary ontology. Employing this takes the guess work out of searching, makes search efficient and precise, educates employees about the lingo used within the corporation, and allows employees to contribute to the corpus of terms. It promotes internal corporate search to rival its superior sibling, internet search. We will cover our experiences, lessons learned, and conclusions from implementing a controlled vocabulary ontology at Sandia National Laboratories. The work focuses on construction of this ontology from the content perspective and the technical perspective. We'll discuss the following: (1) The tool we used to build a polyhierarchical taxonomy; (2) Examples of two methods of indexing the content: traditional 'back of the book' and folksonomy word-mapping; (3) Tips on how to build future search capabilities while building the basic controlled vocabulary; (4) How to implement the controlled vocabulary as an ontology that mimics Google's search suggestions; (5) Making the user experience more interactive and intuitive; and (6) Sorting suggestions based on preferred, alternate and related terms using SPARQL queries. In summary, future improvements will be presented, including permitting end-users to add, edit and remove terms, and filtering on different subject domains.

  13. Enhanced algorithms for enterprise expert search system

    NASA Astrophysics Data System (ADS)

    Molokanov, Valentin; Romanov, Dmitry; Tsibulsky, Valentin

    2013-03-01

    We present the results of our enterprise expert search system application to the task introduced at the Text Retrieval Conference (TREC) in 2007. The expert search system is based on analysis of content and communications topology in an enterprise information space. An optimal set of weighting coefficients for three query-candidate associating algorithms is selected for achieving the best search efficiency on the search collection. The obtained performance proved to be better than at most TREC participants. The hypothesis of additional efficiency improvement by means of query classification is proposed.

  14. Guiding Students to Answers: Query Recommendation

    ERIC Educational Resources Information Center

    Yilmazel, Ozgur

    2011-01-01

    This paper reports on a guided navigation system built on the textbook search engine developed at Anadolu University to support distance education students. The search engine uses Turkish Language specific language processing modules to enable searches over course material presented in Open Education Faculty textbooks. We implemented a guided…

  15. Short-Term Internet-Search Training Is Associated with Increased Fractional Anisotropy in the Superior Longitudinal Fasciculus in the Parietal Lobe

    PubMed Central

    Dong, Guangheng; Li, Hui; Potenza, Marc N.

    2017-01-01

    The Internet search engine has become an indispensable tool for many people, yet the ways in which Internet searching may alter brain structure and function is poorly understood. In this study, we investigated the influence of short-term Internet-search “training” on white matter microstructure using diffusion tensor imaging (DTI). Fifty-nine valid subjects (Experimental group, 43; Control group, 16) completed the whole procedure: pre- DTI scan, 6-day's training and post- DTI scan. Using track-based spatial statistics, we found increased fractional anisotropy in the right superior longitudinal fasciculus at post-test as compared to pre-test in experimental group. Within the identified region of the right superior longitudinal fasciculus, decreased radial diffusivity (RD), and unchanged axial diffusivity (AD) were observed. These results suggest that short-term Internet-search training may increase white-matter integrity in the right superior longitudinal fasciculus. A possible mechanism for the observed FA change may involve increased myelination after training, although this possibility warrants additional investigation. PMID:28706473

  16. Tracking and predicting hand, foot, and mouth disease (HFMD) epidemics in China by Baidu queries.

    PubMed

    Xiao, Q Y; Liu, H J; Feldman, M W

    2017-06-01

    Hand, foot, and mouth disease (HFMD) is highly prevalent in China, and more efficient methods of epidemic detection and early warning need to be developed to augment traditional surveillance systems. In this paper, a method that uses Baidu search queries to track and predict HFMD epidemics is presented, and the outbreaks of HFMD in China during the 60-month period from January 2011 to December 2015 are predicted. The Pearson correlation coefficient (R) of the predictive model and the mean absolute percentage errors between observed HFMD case counts and the predicted number show that our predictive model gives excellent fit to the data. This implies that Baidu search queries can be used in China to track and reliably predict HFMD epidemics, and can serve as a supplement to official systems for HFMD epidemic surveillance.

  17. The CircleSegmentView: a visualization for query preview and visual filtering

    NASA Astrophysics Data System (ADS)

    Klein, Peter; Reiterer, Harald

    2005-03-01

    Users of Information Retrieval systems have often been the target group of Human-Computer Interaction researchers. A lot of effort has been spent inventing new forms of visualizations to support the information seeking process. Information Retrieval and Information Visualization are tight coupled fields of research. Together with psychology (which answers questions like 'how' do users search) and usability engineering (answering questions like 'what' do user expect from user interfaces and their behavior) the research on improving information seeking systems goes on. This paper will concentrate on a meta-data driven, user-centered approach for the query formulation stage. In contrast to the intense research on result-set visualizations we will focus on the development of a visualization which supports human search behavior at the query stage. Additionally this visualization proved that it can compete with other visualizations like the scatter-plot as a visual filter in the result-set presentation.

  18. Large-Scale Continuous Subgraph Queries on Streams

    SciTech Connect

    Choudhury, Sutanay; Holder, Larry; Chin, George; Feo, John T.

    2011-11-30

    Graph pattern matching involves finding exact or approximate matches for a query subgraph in a larger graph. It has been studied extensively and has strong applications in domains such as computer vision, computational biology, social networks, security and finance. The problem of exact graph pattern matching is often described in terms of subgraph isomorphism which is NP-complete. The exponential growth in streaming data from online social networks, news and video streams and the continual need for situational awareness motivates a solution for finding patterns in streaming updates. This is also the prime driver for the real-time analytics market. Development of incremental algorithms for graph pattern matching on streaming inputs to a continually evolving graph is a nascent area of research. Some of the challenges associated with this problem are the same as found in continuous query (CQ) evaluation on streaming databases. This paper reviews some of the representative work from the exhaustively researched field of CQ systems and identifies important semantics, constraints and architectural features that are also appropriate for HPC systems performing real-time graph analytics. For each of these features we present a brief discussion of the challenge encountered in the database realm, the approach to the solution and state their relevance in a high-performance, streaming graph processing framework.

  19. The Complex Dynamics of Sponsored Search Markets

    NASA Astrophysics Data System (ADS)

    Robu, Valentin; La Poutré, Han; Bohte, Sander

    This paper provides a comprehensive study of the structure and dynamics of online advertising markets, mostly based on techniques from the emergent discipline of complex systems analysis. First, we look at how the display rank of a URL link influences its click frequency, for both sponsored search and organic search. Second, we study the market structure that emerges from these queries, especially the market share distribution of different advertisers. We show that the sponsored search market is highly concentrated, with less than 5% of all advertisers receiving over 2/3 of the clicks in the market. Furthermore, we show that both the number of ad impressions and the number of clicks follow power law distributions of approximately the same coefficient. However, we find this result does not hold when studying the same distribution of clicks per rank position, which shows considerable variance, most likely due to the way advertisers divide their budget on different keywords. Finally, we turn our attention to how such sponsored search data could be used to provide decision support tools for bidding for combinations of keywords. We provide a method to visualize keywords of interest in graphical form, as well as a method to partition these graphs to obtain desirable subsets of search terms.

  20. Visualizing a High Recall Search Strategy Output for Undergraduates in an Exploration Stage of Researching a Term Paper.

    ERIC Educational Resources Information Center

    Cole, Charles; Mandelblatt, Bertie; Stevenson, John

    2002-01-01

    Discusses high recall search strategies for undergraduates and how to overcome information overload that results. Highlights include word-based versus visual-based schemes; five summarization and visualization schemes for presenting information retrieval citation output; and results of a study that recommend visualization schemes geared toward…

  1. Needle Federated Search Engine

    SciTech Connect

    2009-12-01

    The Idaho National Laboratory (INL) has combined a number of technologies, tools, and resources to accomplish a new means of federating search results. The resulting product is a search engine called Needle, an open-source-based tool that the INL uses internally for researching across a wide variety of information repositories. Needle has a flexible search interface that allows end users to point at any available data source. A user can select multiple sources such as commercial databases (Web of Science, Engineering Index), external resources (WorldCat, Google Scholar), and internal corporate resources (email, document management system, library collections) in a single interfacemore » with one search query. In the future, INL hopes to offer this open-source engine to the public. This session will outline the development processes for making Needle™s search interface and simplifying the federation of internal and external data sources.« less

  2. A Geospatial Semantic Enrichment and Query Service for Geotagged Photographs

    PubMed Central