internet search queries: Topics by Science.gov

Sample records for internet search queries

Predicting Drug Recalls From Internet Search Engine Queries.

PubMed

Yom-Tov, Elad

2017-01-01

Batches of pharmaceuticals are sometimes recalled from the market when a safety issue or a defect is detected in specific production runs of a drug. Such problems are usually detected when patients or healthcare providers report abnormalities to medical authorities. Here, we test the hypothesis that defective production lots can be detected earlier by monitoring queries to Internet search engines. We extracted queries from the USA to the Bing search engine, which mentioned one of the 5195 pharmaceutical drugs during 2015 and all recall notifications issued by the Food and Drug Administration (FDA) during that year. By using attributes that quantify the change in query volume at the state level, we attempted to predict if a recall of a specific drug will be ordered by FDA in a time horizon ranging from 1 to 40 days in future. Our results show that future drug recalls can indeed be identified with an AUC of 0.791 and a lift at 5% of approximately 6 when predicting a recall occurring one day ahead. This performance degrades as prediction is made for longer periods ahead. The most indicative attributes for prediction are sudden spikes in query volume about a specific medicine in each state. Recalls of prescription drugs and those estimated to be of medium-risk are more likely to be identified using search query data. These findings suggest that aggregated Internet search engine data can be used to facilitate in early warning of faulty batches of medicines.
Searching for cancer information on the internet: analyzing natural language search queries.

PubMed

Bader, Judith L; Theofanos, Mary Frances

2003-12-11

Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary
Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries

PubMed Central

Theofanos, Mary Frances

2003-01-01

Background Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. Objective To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11
Seasonal trends in tinnitus symptomatology: evidence from Internet search engine query data.

PubMed

Plante, David T; Ingram, David G

2015-10-01

The primary aim of this study was to test the hypothesis that the symptom of tinnitus demonstrates a seasonal pattern with worsening in the winter relative to the summer using Internet search engine query data. Normalized search volume for the term 'tinnitus' from January 2004 through December 2013 was retrieved from Google Trends. Seasonal effects were evaluated using cosinor regression models. Primary countries of interest were the United States and Australia. Secondary exploratory analyses were also performed using data from Germany, the United Kingdom, Canada, Sweden, and Switzerland. Significant seasonal effects for 'tinnitus' search queries were found in the United States and Australia (p < 0.00001 for both countries), with peaks in the winter and troughs in the summer. Secondary analyses demonstrated similarly significant seasonal effects for Germany (p < 0.00001), Canada (p < 0.00001), and Sweden (p = 0.0008), again with increased search volume in the winter relative to the summer. Our findings indicate that there are significant seasonal trends for Internet search queries for tinnitus, with a zenith in winter months. Further research is indicated to determine the biological mechanisms underlying these findings, as they may provide insights into the pathophysiology of this common and debilitating medical symptom.
Can internet search queries be used for dengue fever surveillance in China?

PubMed

Guo, Pi; Wang, Li; Zhang, Yanhong; Luo, Ganfeng; Zhang, Yanting; Deng, Changyu; Zhang, Qin; Zhang, Qingying

2017-10-01

China experienced an unprecedented outbreak of dengue fever in 2014, and the number of cases reached the highest level over the past 25 years. Traditional sentinel surveillance systems of dengue fever in China have an obvious drawback that the average delay from receipt to dissemination of dengue case data is roughly 1-2 weeks. In order to exploit internet search queries to timely monitor dengue fever, we analyzed data of dengue incidence and Baidu search query from 31 provinces in mainland China during the period of January 2011 to December 2014. We found that there was a strong correlation between changes in people's online health-seeking behavior and dengue fever incidence. Our study represents the first attempt demonstrating a strong temporal and spatial correlation between internet search trends and dengue epidemics nationwide in China. The findings will help the government to strengthen the capacity of traditional surveillance systems for dengue fever. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Seasonal trends in sleep-disordered breathing: evidence from Internet search engine query data.

PubMed

Ingram, David G; Matthews, Camilla K; Plante, David T

2015-03-01

The primary aim of the current study was to test the hypothesis that there is a seasonal component to snoring and obstructive sleep apnea (OSA) through the use of Google search engine query data. Internet search engine query data were retrieved from Google Trends from January 2006 to December 2012. Monthly normalized search volume was obtained over that 7-year period in the USA and Australia for the following search terms: "snoring" and "sleep apnea". Seasonal effects were investigated by fitting cosinor regression models. In addition, the search terms "snoring children" and "sleep apnea children" were evaluated to examine seasonal effects in pediatric populations. Statistically significant seasonal effects were found using cosinor analysis in both USA and Australia for "snoring" (p < 0.00001 for both countries). Similarly, seasonal patterns were observed for "sleep apnea" in the USA (p = 0.001); however, cosinor analysis was not significant for this search term in Australia (p = 0.13). Seasonal patterns for "snoring children" and "sleep apnea children" were observed in the USA (p = 0.002 and p < 0.00001, respectively), with insufficient search volume to examine these search terms in Australia. All searches peaked in the winter or early spring in both countries, with the magnitude of seasonal effect ranging from 5 to 50 %. Our findings indicate that there are significant seasonal trends for both snoring and sleep apnea internet search engine queries, with a peak in the winter and early spring. Further research is indicated to determine the mechanisms underlying these findings, whether they have clinical impact, and if they are associated with other comorbid medical conditions that have similar patterns of seasonal exacerbation.
Internet search query analysis can be used to demonstrate the rapidly increasing public awareness of palliative care in the USA.

PubMed

McLean, Sarah; Lennon, Paul; Glare, Paul

2017-01-27

A lack of public awareness of palliative care (PC) has been identified as one of the main barriers to appropriate PC access. Internet search query analysis is a novel methodology, which has been effectively used in surveillance of infectious diseases, and can be used to monitor public awareness of health-related topics. We aimed to demonstrate the utility of internet search query analysis to evaluate changes in public awareness of PC in the USA between 2005 and 2015. Google Trends provides a referenced score for the popularity of a search term, for defined regions over defined time periods. The popularity of the search term 'palliative care' was measured monthly between 1/1/2005 and 31/12/2015 in the USA and in the UK. Results were analysed using independent t-tests and joinpoint analysis. The mean monthly popularity of the search term increased between 2008-2009 (p<0.001), 2011-2012 (p<0.001), 2013-2014 (p=0.004) and 2014-2015 (p=0.002) in the USA. Joinpoint analysis was used to evaluate the monthly percentage change (MPC) in the popularity of the search term. In the USA, the MPC increase was 0.6%/month (p<0.05); in the UK the MPC of 0.05% was non-significant. Although internet search query surveillance is a novel methodology, it is freely accessible and has significant potential to monitor health-seeking behaviour among the public. PC is rapidly growing in the USA, and the rapidly increasing public awareness of PC as demonstrated in this study, in comparison with the UK, where PC is relatively well established is encouraging in increasingly ensuring appropriate PC access for all. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Cumulative query method for influenza surveillance using search engine data.

PubMed

Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il

2014-12-16

Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.
From headache to tumour: An examination of health anxiety, health-related Internet use and 'query escalation'.

PubMed

Singh, Karmpaul; Brown, Richard J

2016-09-01

The current study aimed to explore the phenomenon of disease-related 'query escalation' in high/low health anxious Internet users (N = 40). During a 15-minute health-related Internet search, participants rated their anxiety and the perceived seriousness of information on each page. Post-search interviews determined the reasons for, and effects of, escalating queries to consider serious diseases. Both groups were found to be significantly more anxious after escalating queries. The high group was significantly more likely to escalate queries. Evaluating personal relevance of material was the main reason for escalations and moderated anxiety post-escalation. We conclude that searching for online disease information can increase anxiety, particularly for people worried about their health. © The Author(s) 2015.
Using internet searches for influenza surveillance.

PubMed

Polgreen, Philip M; Chen, Yiling; Pennock, David M; Nelson, Forrest D

2008-12-01

The Internet is an important source of health information. Thus, the frequency of Internet searches may provide information regarding infectious disease activity. As an example, we examined the relationship between searches for influenza and actual influenza occurrence. Using search queries from the Yahoo! search engine ( http://search.yahoo.com ) from March 2004 through May 2008, we counted daily unique queries originating in the United States that contained influenza-related search terms. Counts were divided by the total number of searches, and the resulting daily fraction of searches was averaged over the week. We estimated linear models, using searches with 1-10-week lead times as explanatory variables to predict the percentage of cultures positive for influenza and deaths attributable to pneumonia and influenza in the United States. With use of the frequency of searches, our models predicted an increase in cultures positive for influenza 1-3 weeks in advance of when they occurred (P < .001), and similar models predicted an increase in mortality attributable to pneumonia and influenza up to 5 weeks in advance (P < .001). Search-term surveillance may provide an additional tool for disease surveillance.
Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.

PubMed

De-Arteaga, Maria; Eggel, Ivan; Kahn, Charles E; Müller, Henning

2015-10-01

Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.
How popular is waterpipe tobacco smoking? Findings from internet search queries

PubMed Central

Salloum, Ramzi G; Osman, Amira; Maziak, Wasim; Thrasher, James F

2015-01-01

Objectives Waterpipe tobacco smoking (WTS), a traditional tobacco consumption practice in the Middle East, is gaining popularity worldwide. Estimates of population-level interest in WTS over time are not documented. We assessed the popularity of WTS using World Wide Web search query results across four English-speaking countries. Methods We analysed trends in Google search queries related to WTS, comparing these trends with those for electronic cigarettes between 2004 and 2013 in Australia, Canada, the UK and the USA. Weekly search volumes were reported as percentages relative to the week with the highest volume of searches. Results Web-based searches for WTS have increased steadily since 2004 in all four countries. Search volume for WTS was higher than for e-cigarettes in three of the four nations, with the highest volume in the USA. Online searches were primarily targeted at WTS products for home use, followed by searches for WTS cafés/lounges. Conclusions Online demand for information on WTS-related products and venues is large and increasing. Given the rise in WTS popularity, increasing evidence of exposure-related harms, and relatively lax government regulation, WTS is a serious public health concern and could reach epidemic levels in Western societies. PMID:25052859
End User Information Searching on the Internet: How Do Users Search and What Do They Search For? (SIG USE)

ERIC Educational Resources Information Center

Saracevic, Tefko

2000-01-01

Summarizes a presentation that discussed findings and implications of research projects using an Internet search service and Internet-accessible vendor databases, representing the two sides of public database searching: query formulation and resource utilization. Presenters included: Tefko Saracevic, Amanda Spink, Dietmar Wolfram and Hong Xie.…
How popular is waterpipe tobacco smoking? Findings from internet search queries.

PubMed

Salloum, Ramzi G; Osman, Amira; Maziak, Wasim; Thrasher, James F

2015-09-01

Waterpipe tobacco smoking (WTS), a traditional tobacco consumption practice in the Middle East, is gaining popularity worldwide. Estimates of population-level interest in WTS over time are not documented. We assessed the popularity of WTS using World Wide Web search query results across four English-speaking countries. We analysed trends in Google search queries related to WTS, comparing these trends with those for electronic cigarettes between 2004 and 2013 in Australia, Canada, the UK and the USA. Weekly search volumes were reported as percentages relative to the week with the highest volume of searches. Web-based searches for WTS have increased steadily since 2004 in all four countries. Search volume for WTS was higher than for e-cigarettes in three of the four nations, with the highest volume in the USA. Online searches were primarily targeted at WTS products for home use, followed by searches for WTS cafés/lounges. Online demand for information on WTS-related products and venues is large and increasing. Given the rise in WTS popularity, increasing evidence of exposure-related harms, and relatively lax government regulation, WTS is a serious public health concern and could reach epidemic levels in Western societies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Monitoring Influenza Epidemics in China with Search Query from Baidu

PubMed Central

Lv, Benfu; Peng, Geng; Chunara, Rumi; Brownstein, John S.

2013-01-01

Several approaches have been proposed for near real-time detection and prediction of the spread of influenza. These include search query data for influenza-related terms, which has been explored as a tool for augmenting traditional surveillance methods. In this paper, we present a method that uses Internet search query data from Baidu to model and monitor influenza activity in China. The objectives of the study are to present a comprehensive technique for: (i) keyword selection, (ii) keyword filtering, (iii) index composition and (iv) modeling and detection of influenza activity in China. Sequential time-series for the selected composite keyword index is significantly correlated with Chinese influenza case data. In addition, one-month ahead prediction of influenza cases for the first eight months of 2012 has a mean absolute percent error less than 11%. To our knowledge, this is the first study on the use of search query data from Baidu in conjunction with this approach for estimation of influenza activity in China. PMID:23750192
Using internet search queries for infectious disease surveillance: screening diseases for suitability.

PubMed

Milinovich, Gabriel J; Avril, Simon M R; Clements, Archie C A; Brownstein, John S; Tong, Shilu; Hu, Wenbiao

2014-12-31

Internet-based surveillance systems provide a novel approach to monitoring infectious diseases. Surveillance systems built on internet data are economically, logistically and epidemiologically appealing and have shown significant promise. The potential for these systems has increased with increased internet availability and shifts in health-related information seeking behaviour. This approach to monitoring infectious diseases has, however, only been applied to single or small groups of select diseases. This study aims to systematically investigate the potential for developing surveillance and early warning systems using internet search data, for a wide range of infectious diseases. Official notifications for 64 infectious diseases in Australia were downloaded and correlated with frequencies for 164 internet search terms for the period 2009-13 using Spearman's rank correlations. Time series cross correlations were performed to assess the potential for search terms to be used in construction of early warning systems. Notifications for 17 infectious diseases (26.6%) were found to be significantly correlated with a selected search term. The use of internet metrics as a means of surveillance has not previously been described for 12 (70.6%) of these diseases. The majority of diseases identified were vaccine-preventable, vector-borne or sexually transmissible; cross correlations, however, indicated that vector-borne and vaccine preventable diseases are best suited for development of early warning systems. The findings of this study suggest that internet-based surveillance systems have broader applicability to monitoring infectious diseases than has previously been recognised. Furthermore, internet-based surveillance systems have a potential role in forecasting emerging infectious disease events, especially for vaccine-preventable and vector-borne diseases.
SPARK: Adapting Keyword Query to Semantic Search

NASA Astrophysics Data System (ADS)

Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong

Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.
EquiX-A Search and Query Language for XML.

ERIC Educational Resources Information Center

Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

2002-01-01

Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)
Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries

PubMed Central

Lev-Ran, Shaul

2017-01-01

Background Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Objective Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. Methods We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration’s Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Results Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). Conclusions These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. PMID:29074469
Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea

PubMed Central

Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

2016-01-01

Background Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. Methods and Results The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman’s correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Conclusion Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary. PMID:27391028

Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea.

PubMed

Shin, Soo-Yong; Kim, Taerim; Seo, Dong-Woo; Sohn, Chang Hwan; Kim, Sung-Hoon; Ryoo, Seung Mok; Lee, Yoon-Seon; Lee, Jae Ho; Kim, Won Young; Lim, Kyoung Soo

2016-01-01

Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman's correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary.
Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions

NASA Astrophysics Data System (ADS)

Lynnes, C.; Beaumont, B.; Duerr, R. E.; Hua, H.

2009-12-01

The past decade has seen a burgeoning of remote sensing and Earth science data providers, as evidenced in the growth of the Earth Science Information Partner (ESIP) federation. At the same time, the need to combine diverse data sets to enable understanding of the Earth as a system has also grown. While the expansion of data providers is in general a boon to such studies, the diversity presents a challenge to finding useful data for a given study. Locating all the data files with aerosol information for a particular volcanic eruption, for example, may involve learning and using several different search tools to execute the requisite space-time queries. To address this issue, the ESIP federation is developing a federated space-time query framework, based on the OpenSearch convention (www.opensearch.org), with Geo and Time extensions. In this framework, data providers publish OpenSearch Description Documents that describe in a machine-readable form how to execute queries against the provider. The novelty of OpenSearch is that the space-time query interface becomes both machine callable and easy enough to integrate into the web browser's search box. This flexibility, together with a simple REST (HTTP-get) interface, should allow a variety of data providers to participate in the federated search framework, from large institutional data centers to individual scientists. The simple interface enables trivial querying of multiple data sources and participation in recursive-like federated searches--all using the same common OpenSearch interface. This simplicity also makes the construction of clients easy, as does existing OpenSearch client libraries in a variety of languages. Moreover, a number of clients and aggregation services already exist and OpenSearch is already supported by a number of web browsers such as Firefox and Internet Explorer.
Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries.

PubMed

Yom-Tov, Elad; Lev-Ran, Shaul

2017-10-26

Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration's Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R 2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. ©Elad Yom-Tov, Shaul Lev-Ran. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 26.10.2017.
Examining the themes of STD-related Internet searches to increase specificity of disease forecasting using Internet search terms.

PubMed

Johnson, Amy K; Mikati, Tarek; Mehta, Supriya D

2016-11-09

US surveillance of sexually transmitted diseases (STDs) is often delayed and incomplete which creates missed opportunities to identify and respond to trends in disease. Internet search engine data has the potential to be an efficient, economical and representative enhancement to the established surveillance system. Google Trends allows the download of de-identified search engine data, which has been used to demonstrate the positive and statistically significant association between STD-related search terms and STD rates. In this study, search engine user content was identified by surveying specific exposure groups of individuals (STD clinic patients and university students) aged 18-35. Participants were asked to list the terms they use to search for STD-related information. Google Correlate was used to validate search term content. On average STD clinic participant queries were longer compared to student queries. STD clinic participants were more likely to report using search terms that were related to symptomatology such as describing symptoms of STDs, while students were more likely to report searching for general information. These differences in search terms by subpopulation have implications for STD surveillance in populations at most risk for disease acquisition.
Improving Web Search for Difficult Queries

ERIC Educational Resources Information Center

Wang, Xuanhui

2009-01-01

Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…
Worldwide trends in fishing interest indicated by Internet search volume

USGS Publications Warehouse

Wilde, G.R.; Pope, K.L.

2013-01-01

There is a growing body of literature that shows internet search volume on a topic, such as fishing, is a viable measure of salience. Herein, internet search volume for 'fishing' and 'angling' is used as a measure of public interest in fishing, in particular, recreational fishing. An online tool, Google Insights for Search, which allows one to study internet search terms and their volume since 2004, is used to examine trends in interest in fishing for 50 countries. Trends in normalised fishing search volume, during 2004 through 2011, varied from a 72.6% decrease (Russian Federation) to a 133.7% increase (Hungary). Normalised fishing search volume declined in 40 (80%) of the countries studied. The decline has been relatively large in English-speaking countries, but also has been large in Central and South American, and European countries. Analyses of search queries provide a low-cost means of gaining insight into angler interests and, possibly, behaviour in countries around the world.
Google Search Queries About Neurosurgical Topics: Are They a Suitable Guide for Neurosurgeons?

PubMed

Lawson McLean, Anna C; Lawson McLean, Aaron; Kalff, Rolf; Walter, Jan

2016-06-01

Google is the most popular search engine, with about 100 billion searches per month. Google Trends is an integrated tool that allows users to obtain Google's search popularity statistics from the last decade. Our aim was to evaluate whether Google Trends is a useful tool to assess the public's interest in specific neurosurgical topics. We evaluated Google Trends statistics for the neurosurgical search topic areas "hydrocephalus," "spinal stenosis," "concussion," "vestibular schwannoma," and "cerebral arteriovenous malformation." We compared these with bibliometric data from PubMed and epidemiologic data from the German Federal Monitoring Agency. In addition, we assessed Google users' search behavior for the search terms "glioblastoma" and "meningioma." Over the last 10 years, there has been an increasing interest in the topic "concussion" from Internet users in general and scientists. "Spinal stenosis," "concussion," and "vestibular schwannoma" are topics that are of special interest in high-income countries (eg, Germany), whereas "hydrocephalus" is a popular topic in low- and middle-income countries. The Google-defined top searches within these topic areas revealed more detail about people's interests (eg, "normal pressure hydrocephalus" or "football concussion" ranked among the most popular search queries within the corresponding topics). There was a similar volume of queries for "glioblastoma" and "meningioma." Google Trends is a useful source to elicit information about general trends in peoples' health interests and the role of different diseases across the world. The Internet presence of neurosurgical units and surgeons can be guided by online users' interests to achieve high-quality, professional-endorsed patient education. Copyright © 2016 Elsevier Inc. All rights reserved.
Searching the Web: The Public and Their Queries.

ERIC Educational Resources Information Center

Spink, Amanda; Wolfram, Dietmar; Jansen, Major B. J.; Saracevic, Tefko

2001-01-01

Reports findings from a study of searching behavior by over 200,000 users of the Excite search engine. Analysis of over one million queries revealed most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. Concludes that Web searching by the public differs significantly from searching of…
Accessing suicide-related information on the internet: a retrospective observational study of search behavior.

PubMed

Wong, Paul Wai-Ching; Fu, King-Wa; Yau, Rickey Sai-Pong; Ma, Helen Hei-Man; Law, Yik-Wa; Chang, Shu-Sen; Yip, Paul Siu-Fai

2013-01-11

The Internet's potential impact on suicide is of major public health interest as easy online access to pro-suicide information or specific suicide methods may increase suicide risk among vulnerable Internet users. Little is known, however, about users' actual searching and browsing behaviors of online suicide-related information. To investigate what webpages people actually clicked on after searching with suicide-related queries on a search engine and to examine what queries people used to get access to pro-suicide websites. A retrospective observational study was done. We used a web search dataset released by America Online (AOL). The dataset was randomly sampled from all AOL subscribers' web queries between March and May 2006 and generated by 657,000 service subscribers. We found 5526 search queries (0.026%, 5526/21,000,000) that included the keyword "suicide". The 5526 search queries included 1586 different search terms and were generated by 1625 unique subscribers (0.25%, 1625/657,000). Of these queries, 61.38% (3392/5526) were followed by users clicking on a search result. Of these 3392 queries, 1344 (39.62%) webpages were clicked on by 930 unique users but only 1314 of those webpages were accessible during the study period. Each clicked-through webpage was classified into 11 categories. The categories of the most visited webpages were: entertainment (30.13%; 396/1314), scientific information (18.31%; 240/1314), and community resources (14.53%; 191/1314). Among the 1314 accessed webpages, we could identify only two pro-suicide websites. We found that the search terms used to access these sites included "commiting suicide with a gas oven", "hairless goat", "pictures of murder by strangulation", and "photo of a severe burn". A limitation of our study is that the database may be dated and confined to mainly English webpages. Searching or browsing suicide-related or pro-suicide webpages was uncommon, although a small group of users did access websites that contain
Forecasting influenza outbreak dynamics in Melbourne from Internet search query surveillance data.

PubMed

Moss, Robert; Zarebski, Alexander; Dawson, Peter; McCaw, James M

2016-07-01

Accurate forecasting of seasonal influenza epidemics is of great concern to healthcare providers in temperate climates, as these epidemics vary substantially in their size, timing and duration from year to year, making it a challenge to deliver timely and proportionate responses. Previous studies have shown that Bayesian estimation techniques can accurately predict when an influenza epidemic will peak many weeks in advance, using existing surveillance data, but these methods must be tailored both to the target population and to the surveillance system. Our aim was to evaluate whether forecasts of similar accuracy could be obtained for metropolitan Melbourne (Australia). We used the bootstrap particle filter and a mechanistic infection model to generate epidemic forecasts for metropolitan Melbourne (Australia) from weekly Internet search query surveillance data reported by Google Flu Trends for 2006-14. Optimal observation models were selected from hundreds of candidates using a novel approach that treats forecasts akin to receiver operating characteristic (ROC) curves. We show that the timing of the epidemic peak can be accurately predicted 4-6 weeks in advance, but that the magnitude of the epidemic peak and the overall burden are much harder to predict. We then discuss how the infection and observation models and the filtering process may be refined to improve forecast robustness, thereby improving the utility of these methods for healthcare decision support. © 2016 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.
Using Search Engine Query Data to Explore the Epidemiology of Common Gastrointestinal Symptoms.

PubMed

Hassid, Benjamin G; Day, Lukejohn W; Awad, Mohannad A; Sewell, Justin L; Osterberg, E Charles; Breyer, Benjamin N

2017-03-01

Internet searches are an increasingly used tool in medical research. To date, no studies have examined Google search data in relation to common gastrointestinal symptoms. The aim of this study was to compare trends in Internet search volume with clinical datasets for common gastrointestinal symptoms. Using Google Trends, we recorded relative changes in volume of searches related to dysphagia, vomiting, and diarrhea in the USA between January 2008 and January 2011. We queried the National Inpatient Sample (NIS) and the National Hospital Ambulatory Medical Care Survey (NHAMCS) during this time period and identified cases related to these symptoms. We assessed the correlation between Google Trends and these two clinical datasets, as well as examined seasonal variation trends. Changes to Google search volume for all three symptoms correlated significantly with changes to NIS output (dysphagia: r = 0.5, P = 0.002; diarrhea: r = 0.79, P < 0.001; vomiting: r = 0.76, P < 0.001). Both Google and NIS data showed that the prevalence of all three symptoms rose during the time period studied. On the other hand, the NHAMCS data trends during this time period did not correlate well with either the NIS or the Google data for any of the three symptoms studied. Both the NIS and Google data showed modest seasonal variation. Changes to the population burden of chronic GI symptoms may be tracked by monitoring changes to Google search engine query volume over time. These data demonstrate that the prevalence of common GI symptoms is rising over time.
Detecting internet activity for erectile dysfunction using search engine query data in the Republic of Ireland.

PubMed

Davis, Niall F; Smyth, Lisa G; Flood, Hugh D

2012-12-01

What's known on the subject? and What does the study add? Despite the increasing prevalence of erectile dysfunction (ED), there is reluctance among symptomatic patients to present to healthcare providers for appropriate advice and treatment. A number of Internet campaigns have been launched by the Irish healthcare media since 2007 aiming to provide easily accessible advice on ED. Novel online technologies appear to provide a useful tool for educating the general public on the symptoms of ED because there has been a significant increase in overall Internet search activity for this term since 2007. • To assess Internet search trends for erectile dysfunction (ED) subsequent to public awareness campaigns being launched within the Republic of Ireland • To assess whether the advent of such campaigns correlates with increased Internet search activity for ED. • Google insights for search was utilized to examine Internet search trends for the term 'erectile dysfunction' across all categories between January 2005 and December 2011. • Search activity was limited to users from the Republic of Ireland within this timeframe. • Additionally, the number of Irish Internet media campaigns and Irish web pages providing information on ED was assessed between January 2005 and December 2011. • Statistical analysis of the data was performed using analysis of variance and Student's t-tests for pairwise comparisons. • There has been a significant increase in mean search activity for ED on an annual basis since 2007 (P < 0.001). • The number of Irish web pages associated with information on ED has also increased significantly on an annual basis since 2007 (P < 0.001). • There have been seven different Irish Internet media campaigns on ED since 2007 compared to two from 2005 to 2007 (P < 0.001). • There was no significant change in mean search activity for ED from 2005 to 2007 • The advent of recent Internet media campaigns and increasing number of Irish web pages is
Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance

PubMed Central

Chan, Emily H.; Sahai, Vikram; Conrad, Corrie; Brownstein, John S.

2011-01-01

Background A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Methodology/Principal Findings Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Conclusions/Significance Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance. PMID:21647308
Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.

PubMed

Chan, Emily H; Sahai, Vikram; Conrad, Corrie; Brownstein, John S

2011-05-01

A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.
Complex dynamics of our economic life on different scales: insights from search engine query data.

PubMed

Preis, Tobias; Reith, Daniel; Stanley, H Eugene

2010-12-28

Search engine query data deliver insight into the behaviour of individuals who are the smallest possible scale of our economic life. Individuals are submitting several hundred million search engine queries around the world each day. We study weekly search volume data for various search terms from 2004 to 2010 that are offered by the search engine Google for scientific use, providing information about our economic life on an aggregated collective level. We ask the question whether there is a link between search volume data and financial market fluctuations on a weekly time scale. Both collective 'swarm intelligence' of Internet users and the group of financial market participants can be regarded as a complex system of many interacting subunits that react quickly to external changes. We find clear evidence that weekly transaction volumes of S&P 500 companies are correlated with weekly search volume of corresponding company names. Furthermore, we apply a recently introduced method for quantifying complex correlations in time series with which we find a clear tendency that search volume time series and transaction volume time series show recurring patterns.
Do Seasons Have an Influence on the Incidence of Depression? The Use of an Internet Search Engine Query Data as a Proxy of Human Affect

PubMed Central

Yang, Albert C.; Huang, Norden E.; Peng, Chung-Kang; Tsai, Shih-Jen

2010-01-01

Background Seasonal depression has generated considerable clinical interest in recent years. Despite a common belief that people in higher latitudes are more vulnerable to low mood during the winter, it has never been demonstrated that human's moods are subject to seasonal change on a global scale. The aim of this study was to investigate large-scale seasonal patterns of depression using Internet search query data as a signature and proxy of human affect. Methodology/Principal Findings Our study was based on a publicly available search engine database, Google Insights for Search, which provides time series data of weekly search trends from January 1, 2004 to June 30, 2009. We applied an empirical mode decomposition method to isolate seasonal components of health-related search trends of depression in 54 geographic areas worldwide. We identified a seasonal trend of depression that was opposite between the northern and southern hemispheres; this trend was significantly correlated with seasonal oscillations of temperature (USA: r = −0.872, p<0.001; Australia: r = −0.656, p<0.001). Based on analyses of search trends over 54 geological locations worldwide, we found that the degree of correlation between searching for depression and temperature was latitude-dependent (northern hemisphere: r = −0.686; p<0.001; southern hemisphere: r = 0.871; p<0.0001). Conclusions/Significance Our findings indicate that Internet searches for depression from people in higher latitudes are more vulnerable to seasonal change, whereas this phenomenon is obscured in tropical areas. This phenomenon exists universally across countries, regardless of language. This study provides novel, Internet-based evidence for the epidemiology of seasonal depression. PMID:21060851
Variability of patient spine education by Internet search engine.

PubMed

Ghobrial, George M; Mehdi, Angud; Maltenfort, Mitchell; Sharan, Ashwini D; Harrop, James S

2014-03-01

Patients are increasingly reliant upon the Internet as a primary source of medical information. The educational experience varies by search engine, search term, and changes daily. There are no tools for critical evaluation of spinal surgery websites. To highlight the variability between common search engines for the same search terms. To detect bias, by prevalence of specific kinds of websites for certain spinal disorders. Demonstrate a simple scoring system of spinal disorder website for patient use, to maximize the quality of information exposed to the patient. Ten common search terms were used to query three of the most common search engines. The top fifty results of each query were tabulated. A negative binomial regression was performed to highlight the variation across each search engine. Google was more likely than Bing and Yahoo search engines to return hospital ads (P=0.002) and more likely to return scholarly sites of peer-reviewed lite (P=0.003). Educational web sites, surgical group sites, and online web communities had a significantly higher likelihood of returning on any search, regardless of search engine, or search string (P=0.007). Likewise, professional websites, including hospital run, industry sponsored, legal, and peer-reviewed web pages were less likely to be found on a search overall, regardless of engine and search string (P=0.078). The Internet is a rapidly growing body of medical information which can serve as a useful tool for patient education. High quality information is readily available, provided that the patient uses a consistent, focused metric for evaluating online spine surgery information, as there is a clear variability in the way search engines present information to the patient. Published by Elsevier B.V.
Utility of Web search query data in testing theoretical assumptions about mephedrone.

PubMed

Kapitány-Fövény, Máté; Demetrovics, Zsolt

2017-05-01

With growing access to the Internet, people who use drugs and traffickers started to obtain information about novel psychoactive substances (NPS) via online platforms. This paper aims to analyze whether a decreasing Web interest in formerly banned substances-cocaine, heroin, and MDMA-and the legislative status of mephedrone predict Web interest about this NPS. Google Trends was used to measure changes of Web interest on cocaine, heroin, MDMA, and mephedrone. Google search results for mephedrone within the same time frame were analyzed and categorized. Web interest about classic drugs found to be more persistent. Regarding geographical distribution, location of Web searches for heroin and cocaine was less centralized. Illicit status of mephedrone was a negative predictor of its Web search query rates. The connection between mephedrone-related Web search rates and legislative status of this substance was significantly mediated by ecstasy-related Web search queries, the number of documentaries, and forum/blog entries about mephedrone. The results might provide support for the hypothesis that mephedrone's popularity was highly correlated with its legal status as well as it functioned as a potential substitute for MDMA. Google Trends was found to be a useful tool for testing theoretical assumptions about NPS. Copyright © 2017 John Wiley & Sons, Ltd.
Cognitive search model and a new query paradigm

NASA Astrophysics Data System (ADS)

Xu, Zhonghui

2001-06-01

This paper proposes a cognitive model in which people begin to search pictures by using semantic content and find a right picture by judging whether its visual content is a proper visualization of the semantics desired. It is essential that human search is not just a process of matching computation on visual feature but rather a process of visualization of the semantic content known. For people to search electronic images in the way as they manually do in the model, we suggest that querying be a semantic-driven process like design. A query-by-design paradigm is prosed in the sense that what you design is what you find. Unlike query-by-example, query-by-design allows users to specify the semantic content through an iterative and incremental interaction process so that a retrieval can start with association and identification of the given semantic content and get refined while further visual cues are available. An experimental image retrieval system, Kuafu, has been under development using the query-by-design paradigm and an iconic language is adopted.
A study of medical and health queries to web search engines.

PubMed

Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk

2004-03-01

This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.

Query Classification and Study of University Students' Search Trends

ERIC Educational Resources Information Center

Maabreh, Majdi A.; Al-Kabi, Mohammed N.; Alsmadi, Izzat M.

2012-01-01

Purpose: This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet. Design/methodology/approach: The web log files were collected from one of the higher…
Seasonal trends in hypertension in Poland: evidence from Google search engine query data.

PubMed

Płatek, Anna E; Sierdziński, Janusz; Krzowski, Bartosz; Szymański, Filip M

2018-01-01

Various conditions, including arterial hypertension, exhibit seasonal trends in their occurrence and magnitude. Those trends correspond to an interest exhibited in the number of Internet searches for the specific conditions per month. The aim of the study was to show seasonal trends in the hypertension prevalence in Poland relate to the data from the Google Trends tool. Internet search engine query data were retrieved from Google Trends from January 2008 to November 2017. Data were calculated as a monthly normalised search volume from the nine-year period. Data was presented for specific geographic regions, including Poland, the United States of America, Australia, and worldwide for the following search terms: "arterial hypertension (pol. nadciśnienie tętnicze)", "hypertension (pol. nadciśnienie)" and "hypertension medical condition". Seasonal effects were calculated using regression models and presented graphically. In Poland the search volume is the highest between November and May, while patients exhibit the least interest in arterial hypertension during summer holidays (p < 0.05). Seasonal variations are comparable in the United States of America representing a Northern hemisphere country, while in Australia (Southern hemisphere) they exhibit a contrary trend. In conclusion, arterial hypertension is more likely to occur during winter months, which correlates with increased interest in the search phrase "hypertension" in Google.
Locality in Search Engine Queries and Its Implications for Caching

DTIC Science & Technology

2001-05-01

in the question of whether caching might be effective for search engines as well. They study two real search engine traces by examining query...locality and its implications for caching. The two search engines studied are Vivisimo and Excite. Their trace analysis results show that queries have
How Do Children Reformulate Their Search Queries?

ERIC Educational Resources Information Center

Rutter, Sophie; Ford, Nigel; Clough, Paul

2015-01-01

Introduction: This paper investigates techniques used by children in year 4 (age eight to nine) of a UK primary school to reformulate their queries, and how they use information retrieval systems to support query reformulation. Method: An in-depth study analysing the interactions of twelve children carrying out search tasks in a primary school…
Accessing Suicide-Related Information on the Internet: A Retrospective Observational Study of Search Behavior

PubMed Central

2013-01-01

Background The Internet’s potential impact on suicide is of major public health interest as easy online access to pro-suicide information or specific suicide methods may increase suicide risk among vulnerable Internet users. Little is known, however, about users’ actual searching and browsing behaviors of online suicide-related information. Objective To investigate what webpages people actually clicked on after searching with suicide-related queries on a search engine and to examine what queries people used to get access to pro-suicide websites. Methods A retrospective observational study was done. We used a web search dataset released by America Online (AOL). The dataset was randomly sampled from all AOL subscribers’ web queries between March and May 2006 and generated by 657,000 service subscribers. Results We found 5526 search queries (0.026%, 5526/21,000,000) that included the keyword "suicide". The 5526 search queries included 1586 different search terms and were generated by 1625 unique subscribers (0.25%, 1625/657,000). Of these queries, 61.38% (3392/5526) were followed by users clicking on a search result. Of these 3392 queries, 1344 (39.62%) webpages were clicked on by 930 unique users but only 1314 of those webpages were accessible during the study period. Each clicked-through webpage was classified into 11 categories. The categories of the most visited webpages were: entertainment (30.13%; 396/1314), scientific information (18.31%; 240/1314), and community resources (14.53%; 191/1314). Among the 1314 accessed webpages, we could identify only two pro-suicide websites. We found that the search terms used to access these sites included “commiting suicide with a gas oven”, “hairless goat”, “pictures of murder by strangulation”, and “photo of a severe burn”. A limitation of our study is that the database may be dated and confined to mainly English webpages. Conclusions Searching or browsing suicide-related or pro-suicide webpages was
`Googling' Terrorists: Are Northern Irish Terrorists Visible on Internet Search Engines?

NASA Astrophysics Data System (ADS)

Reilly, P.

In this chapter, the analysis suggests that Northern Irish terrorists are not visible on Web search engines when net users employ conventional Internet search techniques. Editors of mass media organisations traditionally have had the ability to decide whether a terrorist atrocity is `newsworthy,' controlling the `oxygen' supply that sustains all forms of terrorism. This process, also known as `gatekeeping,' is often influenced by the norms of social responsibility, or alternatively, with regard to the interests of the advertisers and corporate sponsors that sustain mass media organisations. The analysis presented in this chapter suggests that Internet search engines can also be characterised as `gatekeepers,' albeit without the ability to shape the content of Websites before it reaches net users. Instead, Internet search engines give priority retrieval to certain Websites within their directory, pointing net users towards these Websites rather than others on the Internet. Net users are more likely to click on links to the more `visible' Websites on Internet search engine directories, these sites invariably being the highest `ranked' in response to a particular search query. A number of factors including the design of the Website and the number of links to external sites determine the `visibility' of a Website on Internet search engines. The study suggests that Northern Irish terrorists and their sympathisers are unlikely to achieve a greater degree of `visibility' online than they enjoy in the conventional mass media through the perpetration of atrocities. Although these groups may have a greater degree of freedom on the Internet to publicise their ideologies, they are still likely to be speaking to the converted or members of the press. Although it is easier to locate Northern Irish terrorist organisations on Internet search engines by linking in via ideology, ideological description searches, such as `Irish Republican' and `Ulster Loyalist,' are more likely to
Categorical and Specificity Differences between User-Supplied Tags and Search Query Terms for Images. An Analysis of "Flickr" Tags and Web Image Search Queries

ERIC Educational Resources Information Center

Chung, EunKyung; Yoon, JungWon

2009-01-01

Introduction: The purpose of this study is to compare characteristics and features of user supplied tags and search query terms for images on the "Flickr" Website in terms of categories of pictorial meanings and level of term specificity. Method: This study focuses on comparisons between tags and search queries using Shatford's categorization…
Web Searching: A Process-Oriented Experimental Study of Three Interactive Search Paradigms.

ERIC Educational Resources Information Center

Dennis, Simon; Bruza, Peter; McArthur, Robert

2002-01-01

Compares search effectiveness when using query-based Internet search via the Google search engine, directory-based search via Yahoo, and phrase-based query reformulation-assisted search via the Hyperindex browser by means of a controlled, user-based experimental study of undergraduates at the University of Queensland. Discusses cognitive load,…
Using Internet search behavior to assess public awareness of protected wetlands.

PubMed

Do, Yuno; Kim, Ji Yoon; Lineman, Maurice; Kim, Dong-Kyun; Joo, Gea-Jae

2015-02-01

Improving public awareness of protected wetlands facilitates sustainable wetland management, which depends on public participation. One way of gauging public interest is by tracking Internet search behavior (ISB). We assessed public awareness of issues related to protected wetland areas (PWAs) in South Korea by examining the frequencies of specific queries (PWAs, Ramsar, Upo wetland, Sunchon Bay, etc.) using relative search volumes (RSVs) obtained from an Internet search engine. RSV shows how many times a search term is used relative to a second search term during a specific period. Public awareness of PWAs changed from 2007 to 2013. Initially the majority of Internet searches were related to the most well-known tidal and inland wetlands Sunchon Bay and Upo wetlands, which are the largest existing wetlands in Korea with the greatest historical exposure. Public awareness, as reflected in RSVs, of wetlands increased significantly following PWA designation for the wetlands in 2008, which followed the Ramsar 10th Conference of Contracting Parties to the Convention on Wetlands (COP10) meeting. Public interest was strongly correlated to the number of news articles in the popular media, as evidenced by the increase in Internet searches for specific wetlands and words associated with specific wetlands. Correspondingly, the number of visitors to specific wetlands increased. To increase public interest in wetlands, wetland aspects that enhance wetland conservation should be promoted by the government and enhanced via public education. Our approach can be used to gauge public awareness and participation in a wide range of conservation efforts. © 2014 Society for Conservation Biology.
Examining the Relationship Between Past Orientation and US Suicide Rates: An Analysis Using Big Data-Driven Google Search Queries

PubMed Central

Lee, Donghyun; Lee, Hojun

2016-01-01

Background Internet search query data reflect the attitudes of the users, using which we can measure the past orientation to commit suicide. Examinations of past orientation often highlight certain predispositions of attitude, many of which can be suicide risk factors. Objective To investigate the relationship between past orientation and suicide rate by examining Google search queries. Methods We measured the past orientation using Google search query data by comparing the search volumes of the past year and those of the future year, across the 50 US states and the District of Columbia during the period from 2004 to 2012. We constructed a panel dataset with independent variables as control variables; we then undertook an analysis using multiple ordinary least squares regression and methods that leverage the Akaike information criterion and the Bayesian information criterion. Results It was found that past orientation had a positive relationship with the suicide rate (P≤.001) and that it improves the goodness-of-fit of the model regarding the suicide rate. Unemployment rate (P≤.001 in Models 3 and 4), Gini coefficient (P≤.001), and population growth rate (P≤.001) had a positive relationship with the suicide rate, whereas the gross state product (P≤.001) showed a negative relationship with the suicide rate. Conclusions We empirically identified the positive relationship between the suicide rate and past orientation, which was measured by big data-driven Google search query. PMID:26868917
Examining the Relationship Between Past Orientation and US Suicide Rates: An Analysis Using Big Data-Driven Google Search Queries.

PubMed

Lee, Donghyun; Lee, Hojun; Choi, Munkee

2016-02-11

Internet search query data reflect the attitudes of the users, using which we can measure the past orientation to commit suicide. Examinations of past orientation often highlight certain predispositions of attitude, many of which can be suicide risk factors. To investigate the relationship between past orientation and suicide rate by examining Google search queries. We measured the past orientation using Google search query data by comparing the search volumes of the past year and those of the future year, across the 50 US states and the District of Columbia during the period from 2004 to 2012. We constructed a panel dataset with independent variables as control variables; we then undertook an analysis using multiple ordinary least squares regression and methods that leverage the Akaike information criterion and the Bayesian information criterion. It was found that past orientation had a positive relationship with the suicide rate (P ≤ .001) and that it improves the goodness-of-fit of the model regarding the suicide rate. Unemployment rate (P ≤ .001 in Models 3 and 4), Gini coefficient (P ≤ .001), and population growth rate (P ≤ .001) had a positive relationship with the suicide rate, whereas the gross state product (P ≤ .001) showed a negative relationship with the suicide rate. We empirically identified the positive relationship between the suicide rate and past orientation, which was measured by big data-driven Google search query.
Querying archetype-based EHRs by search ontology-based XPath engineering.

PubMed

Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich

2018-05-11

Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.
Towards computational improvement of DNA database indexing and short DNA query searching.

PubMed

Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

2014-09-03

In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.
Query Log Analysis of an Electronic Health Record Search Engine

PubMed Central

Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.

2011-01-01

We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150
Project Lefty: More Bang for the Search Query

ERIC Educational Resources Information Center

Varnum, Ken

2010-01-01

This article describes the Project Lefty, a search system that, at a minimum, adds a layer on top of traditional federated search tools that will make the wait for results more worthwhile for researchers. At best, Project Lefty improves search queries and relevance rankings for web-scale discovery tools to make the results themselves more relevant…
They’re heating up: Internet search query trends reveal significant public interest in heat-not-burn tobacco products

PubMed Central

Caputi, Theodore L.; Leas, Eric; Dredze, Mark; Cohen, Joanna E.; Ayers, John W.

2017-01-01

Heat-not-burn tobacco products, battery powered devices that heat leaf tobacco to approximately 500 degrees Fahrenheit to produce an inhalable aerosol, are being introduced in markets around the world. Japan, where manufacturers have marketed several heat-not-burn brands since 2014, has been the focal national test market, with the intention of developing global marketing strategies. We used Google search query data to estimate, for the first time, the scale and growth potential of heat-not-burn tobacco products. Average monthly searches for heat-not-burn products rose 1,426% (95%CI: 746,3574) between their first (2015) and second (2016) complete years on the market and an additional 100% (95%CI: 60, 173) between the products second (2016) and third years on the market (Jan-Sep 2017). There are now between 5.9 and 7.5 million heat-not-burn related Google searches in Japan each month based on September 2017 estimates. Moreover, forecasts relying on the historical trends suggest heat-not-burn searches will increase an additional 32% (95%CI: -4 to 79) during 2018, compared to current estimates for 2017 (Jan-Sep), with continued growth thereafter expected. Contrasting heat-not-burn’s rise in Japan to electronic cigarettes’ rise in the United States we find searches for heat-not-burn eclipsed electronic cigarette searches during April 2016. Moreover, the change in average monthly queries for heat-not-burn in Japan between 2015 and 2017 was 399 (95% CI: 184, 1490) times larger than the change in average monthly queries for electronic cigarettes in the Unites States over the same time period, increasing by 2,956% (95% CI: 1729, 7304) compared to only 7% (95% CI: 3,13). Our findings are a clarion call for tobacco control leaders to ready themselves as heat-not-burn tobacco products will likely garner substantial interest as they are introduced into new markets. Public health practitioners should expand heat-not-burn tobacco product surveillance, adjust existing tobacco
They're heating up: Internet search query trends reveal significant public interest in heat-not-burn tobacco products.

PubMed

Caputi, Theodore L; Leas, Eric; Dredze, Mark; Cohen, Joanna E; Ayers, John W

2017-01-01

Heat-not-burn tobacco products, battery powered devices that heat leaf tobacco to approximately 500 degrees Fahrenheit to produce an inhalable aerosol, are being introduced in markets around the world. Japan, where manufacturers have marketed several heat-not-burn brands since 2014, has been the focal national test market, with the intention of developing global marketing strategies. We used Google search query data to estimate, for the first time, the scale and growth potential of heat-not-burn tobacco products. Average monthly searches for heat-not-burn products rose 1,426% (95%CI: 746,3574) between their first (2015) and second (2016) complete years on the market and an additional 100% (95%CI: 60, 173) between the products second (2016) and third years on the market (Jan-Sep 2017). There are now between 5.9 and 7.5 million heat-not-burn related Google searches in Japan each month based on September 2017 estimates. Moreover, forecasts relying on the historical trends suggest heat-not-burn searches will increase an additional 32% (95%CI: -4 to 79) during 2018, compared to current estimates for 2017 (Jan-Sep), with continued growth thereafter expected. Contrasting heat-not-burn's rise in Japan to electronic cigarettes' rise in the United States we find searches for heat-not-burn eclipsed electronic cigarette searches during April 2016. Moreover, the change in average monthly queries for heat-not-burn in Japan between 2015 and 2017 was 399 (95% CI: 184, 1490) times larger than the change in average monthly queries for electronic cigarettes in the Unites States over the same time period, increasing by 2,956% (95% CI: 1729, 7304) compared to only 7% (95% CI: 3,13). Our findings are a clarion call for tobacco control leaders to ready themselves as heat-not-burn tobacco products will likely garner substantial interest as they are introduced into new markets. Public health practitioners should expand heat-not-burn tobacco product surveillance, adjust existing tobacco
An Analysis of Web Image Queries for Search.

ERIC Educational Resources Information Center

Pu, Hsiao-Tieh

2003-01-01

Examines the differences between Web image and textual queries, and attempts to develop an analytic model to investigate their implications for Web image retrieval systems. Provides results that give insight into Web image searching behavior and suggests implications for improvement of current Web image search engines. (AEF)
Assessing Ebola-related web search behaviour: insights and implications from an analytical study of Google Trends-based query volumes.

PubMed

Alicino, Cristiano; Bragazzi, Nicola Luigi; Faccio, Valeria; Amicizia, Daniela; Panatto, Donatella; Gasparini, Roberto; Icardi, Giancarlo; Orsi, Andrea

2015-12-10

The 2014 Ebola epidemic in West Africa has attracted public interest worldwide, leading to millions of Ebola-related Internet searches being performed during the period of the epidemic. This study aimed to evaluate and interpret Google search queries for terms related to the Ebola outbreak both at the global level and in all countries where primary cases of Ebola occurred. The study also endeavoured to look at the correlation between the number of overall and weekly web searches and the number of overall and weekly new cases of Ebola. Google Trends (GT) was used to explore Internet activity related to Ebola. The study period was from 29 December 2013 to 14 June 2015. Pearson's correlation was performed to correlate Ebola-related relative search volumes (RSVs) with the number of weekly and overall Ebola cases. Multivariate regression was performed using Ebola-related RSV as a dependent variable, and the overall number of Ebola cases and the Human Development Index were used as predictor variables. The greatest RSV was registered in the three West African countries mainly affected by the Ebola epidemic. The queries varied in the different countries. Both quantitative and qualitative differences between the affected African countries and other Western countries with primary cases were noted, in relation to the different flux volumes and different time courses. In the affected African countries, web query search volumes were mostly concentrated in the capital areas. However, in Western countries, web queries were uniformly distributed over the national territory. In terms of the three countries mainly affected by the Ebola epidemic, the correlation between the number of new weekly cases of Ebola and the weekly GT index varied from weak to moderate. The correlation between the number of Ebola cases registered in all countries during the study period and the GT index was very high. Google Trends showed a coarse-grained nature, strongly correlating with global
Using search engine query data to track pharmaceutical utilization: a study of statins.

PubMed

Schuster, Nathaniel M; Rogers, Mary A M; McMahon, Laurence F

2010-08-01

To examine temporal and geographic associations between Google queries for health information and healthcare utilization benchmarks. Retrospective longitudinal study. Using Google Trends and Google Insights for Search data, the search terms Lipitor (atorvastatin calcium; Pfizer, Ann Arbor, MI) and simvastatin were evaluated for change over time and for association with Lipitor revenues. The relationship between query data and community-based resource use per Medicare beneficiary was assessed for 35 US metropolitan areas. Google queries for Lipitor significantly decreased from January 2004 through June 2009 and queries for simvastatin significantly increased (P <.001 for both), particularly after Lipitor came off patent (P <.001 for change in slope). The mean number of Google queries for Lipitor correlated (r = 0.98) with the percentage change in Lipitor global revenues from 2004 to 2008 (P <.001). Query preference for Lipitor over simvastatin was positively associated (r = 0.40) with a community's use of Medicare services. For every 1% increase in utilization of Medicare services in a community, there was a 0.2-unit increase in the ratio of Lipitor queries to simvastatin queries in that community (P = .02). Specific search engine queries for medical information correlate with pharmaceutical revenue and with overall healthcare utilization in a community. This suggests that search query data can track community-wide characteristics in healthcare utilization and have the potential for informing payers and policy makers regarding trends in utilization.

Web search queries can predict stock market volumes.

PubMed

Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

2012-01-01

We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.
Web Search Queries Can Predict Stock Market Volumes

PubMed Central

Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

2012-01-01

We live in a computerized and networked society where many of our actions leave a digital trace and affect other people’s actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www. PMID:22829871
RadSearch: a RIS/PACS integrated query tool

NASA Astrophysics Data System (ADS)

Tsao, Sinchai; Documet, Jorge; Moin, Paymann; Wang, Kevin; Liu, Brent J.

2008-03-01

Radiology Information Systems (RIS) contain a wealth of information that can be used for research, education, and practice management. However, the sheer amount of information available makes querying specific data difficult and time consuming. Previous work has shown that a clinical RIS database and its RIS text reports can be extracted, duplicated and indexed for searches while complying with HIPAA and IRB requirements. This project's intent is to provide a software tool, the RadSearch Toolkit, to allow intelligent indexing and parsing of RIS reports for easy yet powerful searches. In addition, the project aims to seamlessly query and retrieve associated images from the Picture Archiving and Communication System (PACS) in situations where an integrated RIS/PACS is in place - even subselecting individual series, such as in an MRI study. RadSearch's application of simple text parsing techniques to index text-based radiology reports will allow the search engine to quickly return relevant results. This powerful combination will be useful in both private practice and academic settings; administrators can easily obtain complex practice management information such as referral patterns; researchers can conduct retrospective studies with specific, multiple criteria; teaching institutions can quickly and effectively create thorough teaching files.
Information is in the eye of the beholder: Seeking information on the MMR vaccine through an Internet search engine.

PubMed

Yom-Tov, Elad; Fernandez-Luque, Luis

2014-01-01

Vaccination campaigns are one of the most important and successful public health programs ever undertaken. People who want to learn about vaccines in order to make an informed decision on whether to vaccinate are faced with a wealth of information on the Internet, both for and against vaccinations. In this paper we develop an automated way to score Internet search queries and web pages as to the likelihood that a person making these queries or reading those pages would decide to vaccinate. We apply this method to data from a major Internet search engine, while people seek information about the Measles, Mumps and Rubella (MMR) vaccine. We show that our method is accurate, and use it to learn about the information acquisition process of people. Our results show that people who are pro-vaccination as well as people who are anti-vaccination seek similar information, but browsing this information has differing effect on their future browsing. These findings demonstrate the need for health authorities to tailor their information according to the current stance of users.
Information is in the eye of the beholder: Seeking information on the MMR vaccine through an Internet search engine

PubMed Central

Yom-Tov, Elad; Fernandez-Luque, Luis

2014-01-01

Vaccination campaigns are one of the most important and successful public health programs ever undertaken. People who want to learn about vaccines in order to make an informed decision on whether to vaccinate are faced with a wealth of information on the Internet, both for and against vaccinations. In this paper we develop an automated way to score Internet search queries and web pages as to the likelihood that a person making these queries or reading those pages would decide to vaccinate. We apply this method to data from a major Internet search engine, while people seek information about the Measles, Mumps and Rubella (MMR) vaccine. We show that our method is accurate, and use it to learn about the information acquisition process of people. Our results show that people who are pro-vaccination as well as people who are anti-vaccination seek similar information, but browsing this information has differing effect on their future browsing. These findings demonstrate the need for health authorities to tailor their information according to the current stance of users. PMID:25954435
Searching Databases without Query-Building Aids: Implications for Dyslexic Users

ERIC Educational Resources Information Center

Berget, Gerd; Sandnes, Frode Eika

2015-01-01

Introduction: Few studies document the information searching behaviour of users with cognitive impairments. This paper therefore addresses the effect of dyslexia on information searching in a database with no tolerance for spelling errors and no query-building aids. The purpose was to identify effective search interface design guidelines that…
Respiratory syncytial virus tracking using internet search engine data.

PubMed

Oren, Eyal; Frere, Justin; Yom-Tov, Eran; Yom-Tov, Elad

2018-04-03

Respiratory Syncytial Virus (RSV) is the leading cause of hospitalization in children less than 1 year of age in the United States. Internet search engine queries may provide high resolution temporal and spatial data to estimate and predict disease activity. After filtering an initial list of 613 symptoms using high-resolution Bing search logs, we used Google Trends data between 2004 and 2016 for a smaller list of 50 terms to build predictive models of RSV incidence for five states where long-term surveillance data was available. We then used domain adaptation to model RSV incidence for the 45 remaining US states. Surveillance data sources (hospitalization and laboratory reports) were highly correlated, as were laboratory reports with search engine data. The four terms which were most often statistically significantly correlated as time series with the surveillance data in the five state models were RSV, flu, pneumonia, and bronchiolitis. Using our models, we tracked the spread of RSV by observing the time of peak use of the search term in different states. In general, the RSV peak moved from south-east (Florida) to the north-west US. Our study represents the first time that RSV has been tracked using Internet data results and highlights successful use of search filters and domain adaptation techniques, using data at multiple resolutions. Our approach may assist in identifying spread of both local and more widespread RSV transmission and may be applicable to other seasonal conditions where comprehensive epidemiological data is difficult to collect or obtain.
Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search.

PubMed

Liu, Xianglong; Deng, Cheng; Lang, Bo; Tao, Dacheng; Li, Xuelong

2016-02-01

Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significantly outperform both
Does query expansion limit our learning? A comparison of social-based expansion to content-based expansion for medical queries on the internet.

PubMed

Pentoney, Christopher; Harwell, Jeff; Leroy, Gondy

2014-01-01

Searching for medical information online is a common activity. While it has been shown that forming good queries is difficult, Google's query suggestion tool, a type of query expansion, aims to facilitate query formation. However, it is unknown how this expansion, which is based on what others searched for, affects the information gathering of the online community. To measure the impact of social-based query expansion, this study compared it with content-based expansion, i.e., what is really in the text. We used 138,906 medical queries from the AOL User Session Collection and expanded them using Google's Autocomplete method (social-based) and the content of the Google Web Corpus (content-based). We evaluated the specificity and ambiguity of the expansion terms for trigram queries. We also looked at the impact on the actual results using domain diversity and expansion edit distance. Results showed that the social-based method provided more precise expansion terms as well as terms that were less ambiguous. Expanded queries do not differ significantly in diversity when expanded using the social-based method (6.72 different domains returned in the first ten results, on average) vs. content-based method (6.73 different domains, on average).
Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services.

PubMed

Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong

2016-01-01

Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider.
Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services

PubMed Central

Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong

2016-01-01

Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider. PMID:27571421
Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.

PubMed

Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

2016-07-04

As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using
Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

PubMed Central

Woo, Hyekyung; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

2016-01-01

Background As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. Objective In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Methods Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. Results In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). Conclusions These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In
A novel adaptive Cuckoo search for optimal query plan generation.

PubMed

Gomathi, Ramalingam; Sharmila, Dhandapani

2014-01-01

The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.
Raising the IQ in full-text searching via intelligent querying

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kero, R.; Russell, L.; Swietlik, C.

1994-11-01

Current Information Retrieval (IR) technologies allow for efficient access to relevant information, provided that user selected query terms coincide with the specific linguistical choices made by the authors whose works constitute the text-base. Therefore, the challenge is to enhance the limited searching capability of state-of-the-practice IR. This can be done either with augmented clients that overcome current server searching deficiencies, or with added capabilities that can augment searching algorithms on the servers. The technology being investigated is that of deductive databases, with a set of new techniques called cooperative answering. This technology utilizes semantic networks to allow for navigation betweenmore » possible query search term alternatives. The augmented search terms are passed to an IR engine and the results can be compared. The project utilizes the OSTI Environment, Safety and Health Thesaurus to populate the domain specific semantic network and the text base of ES&H related documents from the Facility Profile Information Management System as the domain specific search space.« less
BEAUTY-X: enhanced BLAST searches for DNA queries.

PubMed

Worley, K C; Culpepper, P; Wiese, B A; Smith, R F

1998-01-01

BEAUTY (BLAST Enhanced Alignment Utility) is an enhanced version of the BLAST database search tool that facilitates identification of the functions of matched sequences. Three recent improvements to the BEAUTY program described here make the enhanced output (1) available for DNA queries, (2) available for searches of any protein database, and (3) more up-to-date, with periodic updates of the domain information. BEAUTY searches of the NCBI and EMBL non-redundant protein sequence databases are available from the BCM Search Launcher Web pages (http://gc.bcm.tmc. edu:8088/search-launcher/launcher.html). BEAUTY Post-Processing of submitted search results is available using the BCM Search Launcher Batch Client (version 2.6) (ftp://gc.bcm.tmc. edu/pub/software/search-launcher/). Example figures are available at http://dot.bcm.tmc. edu:9331/papers/beautypp.html (kworley,culpep)@bcm.tmc.edu
Monitoring hand, foot and mouth disease by combining search engine query data and meteorological factors.

PubMed

Huang, Da-Cang; Wang, Jin-Feng

2018-01-15

Hand, foot and mouth disease (HFMD) has been recognized as a significant public health threat and poses a tremendous challenge to disease control departments. To date, the relationship between meteorological factors and HFMD has been documented, and public interest of disease has been proven to be trackable from the Internet. However, no study has explored the combination of these two factors in the monitoring of HFMD. Therefore, the main aim of this study was to develop an effective monitoring model of HFMD in Guangzhou, China by utilizing historical HFMD cases, Internet-based search engine query data and meteorological factors. To this end, a case study was conducted in Guangzhou, using a network-based generalized additive model (GAM) including all factors related to HFMD. Three other models were also constructed using some of the variables for comparison. The results suggested that the model showed the best estimating ability when considering all of the related factors. Copyright © 2017 Elsevier B.V. All rights reserved.
Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.

PubMed

Khennak, Ilyes; Drias, Habiba

2017-02-01

With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.
A Study on Pubmed Search Tag Usage Pattern: Association Rule Mining of a Full-day Pubmed Query Log

PubMed Central

2013-01-01

Background The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. Methods A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. Results The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. Conclusions The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed’s Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE. PMID:23302604
Manually Classifying User Search Queries on an Academic Library Web Site

ERIC Educational Resources Information Center

Chapman, Suzanne; Desai, Shevon; Hagedorn, Kat; Varnum, Ken; Mishra, Sonali; Piacentine, Julie

2013-01-01

The University of Michigan Library wanted to learn more about the kinds of searches its users were conducting through the "one search" search box on the Library Web site. Library staff conducted two investigations. A preliminary investigation in 2011 involved the manual review of the 100 most frequently occurring queries conducted…

System for Performing Single Query Searches of Heterogeneous and Dispersed Databases

NASA Technical Reports Server (NTRS)

Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)

2017-01-01

The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.
Influence of legislations and news on Indian internet search query patterns of e-cigarettes.

PubMed

Thavarajah, Rooban; Mohandoss, Anusa Arunachalam; Ranganathan, Kannan; Kondalsamy-Chennakesavan, Srinivas

2017-01-01

There is a paucity of data on the use of electronic nicotine delivery systems (ENDS) in India. In addition, the Indian internet search pattern for ENDS has not been studied. We aimed to address this lacuna. Moreover, the influence of the tobacco legislations and news pieces on such search volume is not known. Given the fact that ENDS could cause oral lesions, these data are pertinent to dentists. Using a time series analysis, we examined the effect of tobacco-related legislations and news pieces on total search volume (TSV) from September 1, 2012, to August 31, 2016. TSV data were seasonally adjusted and analyzed using time series modeling. The TSV clocked during the month of legislations and news pieces were analyzed for their influence on search pattern of ENDS. The overall mean ± standard deviation (range) TSV was 22273.75 ± 6784.01 (12310-40510) during the study with seasonal variations. Individually, the best model for TSV-legislation and news pieces was autoregressive integrated moving average model, and when influence of legislations and news events were combined, it was the Winter's additive model. In the legislation alone model, the pre-event, event and post-event month TSV was not a better indicator of the effect, barring for post-event month of 2 nd legislation, which involved pictorial warnings on packages in the study period. Similarly, a news piece on Pan-India ban on ENDS influenced the model in the news piece model. When combined, no "events" emerged significant. These findings suggest that search for information on ENDS is increasing and that these tobacco control policies and news items, targeting tobacco usage reduction, have only a short-term effect on the rate of searching for information on ENDS.
News trends and web search query of HIV/AIDS in Hong Kong

PubMed Central

Chiu, Alice P. Y.; Lin, Qianying

2017-01-01

Background The HIV epidemic in Hong Kong has worsened in recent years, with major contributions from high-risk subgroup of men who have sex with men (MSM). Internet use is prevalent among the majority of the local population, where they sought health information online. This study examines the impacts of HIV/AIDS and MSM news coverage on web search query in Hong Kong. Methods Relevant news coverage about HIV/AIDS and MSM from January 1st, 2004 to December 31st, 2014 was obtained from the WiseNews databse. News trends were created by computing the number of relevant articles by type, topic, place of origin and sub-populations. We then obtained relevant search volumes from Google and analysed causality between news trends and Google Trends using Granger Causality test and orthogonal impulse function. Results We found that editorial news has an impact on “HIV” Google searches on HIV, with the search term popularity peaking at an average of two weeks after the news are published. Similarly, editorial news has an impact on the frequency of “AIDS” searches two weeks after. MSM-related news trends have a more fluctuating impact on “MSM” Google searches, although the time lag varies anywhere from one week later to ten weeks later. Conclusions This infodemiological study shows that there is a positive impact of news trends on the online search behavior of HIV/AIDS or MSM-related issues for up to ten weeks after. Health promotional professionals could make use of this brief time window to tailor the timing of HIV awareness campaigns and public health interventions to maximise its reach and effectiveness. PMID:28922376
News trends and web search query of HIV/AIDS in Hong Kong.

PubMed

Chiu, Alice P Y; Lin, Qianying; He, Daihai

2017-01-01

The HIV epidemic in Hong Kong has worsened in recent years, with major contributions from high-risk subgroup of men who have sex with men (MSM). Internet use is prevalent among the majority of the local population, where they sought health information online. This study examines the impacts of HIV/AIDS and MSM news coverage on web search query in Hong Kong. Relevant news coverage about HIV/AIDS and MSM from January 1st, 2004 to December 31st, 2014 was obtained from the WiseNews databse. News trends were created by computing the number of relevant articles by type, topic, place of origin and sub-populations. We then obtained relevant search volumes from Google and analysed causality between news trends and Google Trends using Granger Causality test and orthogonal impulse function. We found that editorial news has an impact on "HIV" Google searches on HIV, with the search term popularity peaking at an average of two weeks after the news are published. Similarly, editorial news has an impact on the frequency of "AIDS" searches two weeks after. MSM-related news trends have a more fluctuating impact on "MSM" Google searches, although the time lag varies anywhere from one week later to ten weeks later. This infodemiological study shows that there is a positive impact of news trends on the online search behavior of HIV/AIDS or MSM-related issues for up to ten weeks after. Health promotional professionals could make use of this brief time window to tailor the timing of HIV awareness campaigns and public health interventions to maximise its reach and effectiveness.
Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.

PubMed

Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai

2017-03-01

The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.
[On the seasonality of dermatoses: a retrospective analysis of search engine query data depending on the season].

PubMed

Köhler, M J; Springer, S; Kaatz, M

2014-09-01

The volume of search engine queries about disease-relevant items reflects public interest and correlates with disease prevalence as proven by the example of flu (influenza). Other influences include media attention or holidays. The present work investigates if the seasonality of prevalence or symptom severity of dermatoses correlates with search engine query data. The relative weekly volume of dermatological relevant search terms was assessed by the online tool Google Trends for the years 2009-2013. For each item, the degree of seasonality was calculated via frequency analysis and a geometric approach. Many dermatoses show a marked seasonality, reflected by search engine query volumes. Unexpected seasonal variations of these queries suggest a previously unknown variability of the respective disease prevalence. Furthermore, using the example of allergic rhinitis, a close correlation of search engine query data with actual pollen count can be demonstrated. In many cases, search engine query data are appropriate to estimate seasonal variability in prevalence of common dermatoses. This finding may be useful for real-time analysis and formation of hypotheses concerning pathogenetic or symptom aggravating mechanisms and may thus contribute to improvement of diagnostics and prevention of skin diseases.
What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the Internet

PubMed Central

Eysenbach, G.; Kohler, Ch.

2003-01-01

While health information is often said to be the most sought after information on the web, empirical data on the actual frequency of health-related searches on the web are missing. In the present study we aimed to determine the prevalence of health-related searches on the web by analyzing search terms entered by people into popular search engines. We also made some preliminary attempts in qualitatively describing and classifying these searches. Occasional difficulties in determining what constitutes a “health-related” search led us to propose and validate a simple method to automatically classify a search string as “health-related”. This method is based on determining the proportion of pages on the web containing the search string and the word “health”, as a proportion of the total number of pages with the search string alone. Using human codings as gold standard we plotted a ROC curve and determined empirically that if this “co-occurance rate” is larger than 35%, the search string can be said to be health-related (sensitivity: 85.2%, specificity 80.4%). The results of our “human” codings of search queries determined that about 4.5% of all searches are “health-related”. We estimate that globally a minimum of 6.75 Million health-related searches are being conducted on the web every day, which is roughly the same number of searches that have been conducted on the NLM Medlars system in 1996 in a full year. PMID:14728167
Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold

PubMed Central

Li, Weizhong; Lopez, Rodrigo

2017-01-01

Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999
Searching for Images: The Analysis of Users' Queries for Image Retrieval in American History.

ERIC Educational Resources Information Center

Choi, Youngok; Rasmussen, Edie M.

2003-01-01

Studied users' queries for visual information in American history to identify the image attributes important for retrieval and the characteristics of users' queries for digital images, based on queries from 38 faculty and graduate students. Results of pre- and post-test questionnaires and interviews suggest principle categories of search terms.…
Sexual information seeking on web search engines.

PubMed

Spink, Amanda; Koricich, Andrew; Jansen, B J; Cole, Charles

2004-02-01

Sexual information seeking is an important element within human information behavior. Seeking sexually related information on the Internet takes many forms and channels, including chat rooms discussions, accessing Websites or searching Web search engines for sexual materials. The study of sexual Web queries provides insight into sexually-related information-seeking behavior, of value to Web users and providers alike. We qualitatively analyzed queries from logs of 1,025,910 Alta Vista and AlltheWeb.com Web user queries from 2001. We compared the differences in sexually-related Web searching between Alta Vista and AlltheWeb.com users. Differences were found in session duration, query outcomes, and search term choices. Implications of the findings for sexual information seeking are discussed.
Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge.

PubMed

Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J

2016-08-02

Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.
A study on PubMed search tag usage pattern: association rule mining of a full-day PubMed query log.

PubMed

Mosa, Abu Saleh Mohammad; Yoo, Illhoi

2013-01-09

The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed's Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE.
Search strategies on the Internet: general and specific.

PubMed

Bottrill, Krys

2004-06-01

Some of the most up-to-date information on scientific activity is to be found on the Internet; for example, on the websites of academic and other research institutions and in databases of currently funded research studies provided on the websites of funding bodies. Such information can be valuable in suggesting new approaches and techniques that could be applicable in a Three Rs context. However, the Internet is a chaotic medium, not subject to the meticulous classification and organisation of classical information resources. At the same time, Internet search engines do not match the sophistication of search systems used by database hosts. Also, although some offer relatively advanced features, user awareness of these tends to be low. Furthermore, much of the information on the Internet is not accessible to conventional search engines, giving rise to the concept of the "Invisible Web". General strategies and techniques for Internet searching are presented, together with a comparative survey of selected search engines. The question of how the Invisible Web can be accessed is discussed, as well as how to keep up-to-date with Internet content and improve searching skills.
Behavioural and brain responses related to Internet search and memory.

PubMed

Dong, Guangheng; Potenza, Marc N

2015-10-01

The ready availability of data via searches on the Internet has changed how many people seek and perhaps store and recall information, although the brain mechanisms underlying these processes are not well understood. This study investigated brain mechanisms underlying Internet-based vs. non-Internet-based searching. The results showed that Internet searching was associated with lower accuracy in recalling information as compared with traditional book searching. During functional magnetic resonance imaging, Internet searching was associated with less regional brain activation in the left ventral stream, the association area of the temporal-parietal-occipital cortices, and the middle frontal cortex. When comparing novel items with remembered trials, Internet-based searching was associated with higher brain activation in the right orbitofrontal cortex and lower brain activation in the right middle temporal gyrus when facing those novel trials. Brain activations in the middle temporal gyrus were inversely correlated with response times, and brain activations in the orbitofrontal cortex were positively correlated with self-reported search impulses. Taken together, the results suggest that, although Internet-based searching may have facilitated the information-acquisition process, this process may have been performed more hastily and be more prone to difficulties in recollection. In addition, people appear less confident in recalling information learned through Internet searching and that recent Internet searching may promote motivation to use the Internet. © 2015 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
A novel methodology for querying web images

NASA Astrophysics Data System (ADS)

Prabhakara, Rashmi; Lee, Ching Cheng

2005-01-01

Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.
A novel methodology for querying web images

NASA Astrophysics Data System (ADS)

Prabhakara, Rashmi; Lee, Ching Cheng

2004-12-01

Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.
Seasonal variation in Internet searches for vitamin D.

PubMed

Moon, Rebecca J; Curtis, Elizabeth M; Davies, Justin H; Cooper, Cyrus; Harvey, Nicholas C

2017-12-01

Internet search rates for "vitamin D" were explored using Google Trends. Search rates increased from 2004 until 2010 and thereafter displayed a seasonal pattern peaking in late winter. This knowledge could help guide the timing of public health interventions aimed at managing vitamin D deficiency. The Internet is an important source of health information. Analysis of Internet search activity rates can provide information on disease epidemiology, health related behaviors and public interest. We explored Internet search rates for vitamin D to determine whether this reflects the increasing scientific interest in this topic. Google Trends is a publically available tool that provides data on Internet searches using Google. Search activity for the term "vitamin D" from 1st January 2004 until 31st October 2016 was obtained. Comparison was made to other bone and nutrition related terms. Worldwide, searches for "vitamin D" increased from 2004 until 2010 and thereafter a statistically significant (p < 0.001) seasonal pattern with a peak in February and nadir in August was observed. This seasonal pattern was evident for searches originating from both the USA (peak in February) and Australia (peak in August); p < 0.001 for both. Searches for the terms "osteoporosis", "rickets", "back pain" or "folic acid" did not display the increase observed for vitamin D or evidence of seasonal variation. Public interest in vitamin D, as assessed by Internet search activity, did increase from 2004 to 2010, likely reflecting the growing scientific interest, but now displays a seasonal pattern with peak interest during late winter. This information could be used to guide public health approaches to managing vitamin D deficiency.
Index Compression and Efficient Query Processing in Large Web Search Engines

ERIC Educational Resources Information Center

Ding, Shuai

2013-01-01

The inverted index is the main data structure used by all the major search engines. Search engines build an inverted index on their collection to speed up query processing. As the size of the web grows, the length of the inverted list structures, which can easily grow to hundreds of MBs or even GBs for common terms (roughly linear in the size of…
Parallel multi-join query optimization algorithm for distributed sensor network in the internet of things

NASA Astrophysics Data System (ADS)

Zheng, Yan

2015-03-01

Internet of things (IoT), focusing on providing users with information exchange and intelligent control, attracts a lot of attention of researchers from all over the world since the beginning of this century. IoT is consisted of large scale of sensor nodes and data processing units, and the most important features of IoT can be illustrated as energy confinement, efficient communication and high redundancy. With the sensor nodes increment, the communication efficiency and the available communication band width become bottle necks. Many research work is based on the instance which the number of joins is less. However, it is not proper to the increasing multi-join query in whole internet of things. To improve the communication efficiency between parallel units in the distributed sensor network, this paper proposed parallel query optimization algorithm based on distribution attributes cost graph. The storage information relations and the network communication cost are considered in this algorithm, and an optimized information changing rule is established. The experimental result shows that the algorithm has good performance, and it would effectively use the resource of each node in the distributed sensor network. Therefore, executive efficiency of multi-join query between different nodes could be improved.
Determination of geographic variance in stroke prevalence using Internet search engine analytics.

PubMed

Walcott, Brian P; Nahed, Brian V; Kahle, Kristopher T; Redjal, Navid; Coumans, Jean-Valery

2011-06-01

Previous methods to determine stroke prevalence, such as nationwide surveys, are labor-intensive endeavors. Recent advances in search engine query analytics have led to a new metric for disease surveillance to evaluate symptomatic phenomenon, such as influenza. The authors hypothesized that the use of search engine query data can determine the prevalence of stroke. The Google Insights for Search database was accessed to analyze anonymized search engine query data. The authors' search strategy utilized common search queries used when attempting either to identify the signs and symptoms of a stroke or to perform stroke education. The search logic was as follows: (stroke signs + stroke symptoms + mini stroke--heat) from January 1, 2005, to December 31, 2010. The relative number of searches performed (the interest level) for this search logic was established for all 50 states and the District of Columbia. A Pearson product-moment correlation coefficient was calculated from the statespecific stroke prevalence data previously reported. Web search engine interest level was available for all 50 states and the District of Columbia over the time period for January 1, 2005-December 31, 2010. The interest level was highest in Alabama and Tennessee (100 and 96, respectively) and lowest in California and Virginia (58 and 53, respectively). The Pearson correlation coefficient (r) was calculated to be 0.47 (p = 0.0005, 2-tailed). Search engine query data analysis allows for the determination of relative stroke prevalence. Further investigation will reveal the reliability of this metric to determine temporal pattern analysis and prevalence in this and other symptomatic diseases.

How natural hazards influence Internet searches

NASA Astrophysics Data System (ADS)

Geyer, Adelina; Martí, Joan; Villaseñor, Antonio

2017-04-01

Effective dissemination of correct and easy-to-understand scientific information is one of the most imperative tasks of natural hazard assessment and risk management, being the media and the population the two fundamental groups of receptors. It has been observed how during the occurrence of hazardous natural phenomena, media and population desperately seek for information in all possible channels. Traditionally, these have been the radio and television, but over the past decades, the Internet has also become a significant information resource. Nevertheless, how the Internet search behavior changes during the occurrence of natural phenomena of significant societal impact (i.e. involving important human and/or economic losses) has never been analyzed so far. Focusing mainly on volcanism, we use here for the first time Internet search data provided by Google Trends to examine the search patterns of volcanology-related terms and how these may change during unrest periods or volcanic crises. Results obtained allow us to evaluate, at a global and local scale, the interest of society towards volcanological phenomena and its potential background knowledge of Earth Sciences. We show here how Internet search data turns to be a promising tool for the global and local monitoring of awareness and education background of society on natural phenomena in general, and volcanic hazards in particular.
Design of an On-Line Query Language for Full Text Patent Search.

ERIC Educational Resources Information Center

Glantz, Richard S.

The design of an English-like query language and an interactive computer environment for searching the full text of the U.S. patent collection are discussed. Special attention is paid to achieving a transparent user interface, to providing extremely broad search capabilities (including nested substitution classes, Kleene star events, and domain…
Query-Adaptive Hash Code Ranking for Large-Scale Multi-View Visual Search.

PubMed

Liu, Xianglong; Huang, Lei; Deng, Cheng; Lang, Bo; Tao, Dacheng

2016-10-01

Hash-based nearest neighbor search has become attractive in many applications. However, the quantization in hashing usually degenerates the discriminative power when using Hamming distance ranking. Besides, for large-scale visual search, existing hashing methods cannot directly support the efficient search over the data with multiple sources, and while the literature has shown that adaptively incorporating complementary information from diverse sources or views can significantly boost the search performance. To address the problems, this paper proposes a novel and generic approach to building multiple hash tables with multiple views and generating fine-grained ranking results at bitwise and tablewise levels. For each hash table, a query-adaptive bitwise weighting is introduced to alleviate the quantization loss by simultaneously exploiting the quality of hash functions and their complement for nearest neighbor search. From the tablewise aspect, multiple hash tables are built for different data views as a joint index, over which a query-specific rank fusion is proposed to rerank all results from the bitwise ranking by diffusing in a graph. Comprehensive experiments on image search over three well-known benchmarks show that the proposed method achieves up to 17.11% and 20.28% performance gains on single and multiple table search over the state-of-the-art methods.
Who Searches the Internet for Health Information?

PubMed Central

Bundorf, M Kate; Wagner, Todd H; Singer, Sara J; Baker, Laurence C

2006-01-01

Objective To determine what types of consumers use the Internet as a source of health information Data Sources A survey of consumer use of the Internet for health information conducted during December 2001 and January 2002 Study Design We estimated multivariate regression models to test hypotheses regarding the characteristics of consumers that affect information seeking behavior Data Collection Respondents were randomly sampled from an Internet-enabled panel of over 60,000 households. Our survey was sent to 12,878 panel members, and 69.4 percent of surveyed panel members responded. We collected information about respondents' use of the Internet to search for health information and to communicate about health care with others using the Internet or e-mail within the last year Principal Findings Individuals with reported chronic conditions were more likely than those without to search for health information on the Internet. The uninsured, particularly those with a reported chronic condition, were more likely than the privately insured to search. Individuals with longer travel times for their usual source of care were more likely to use the Internet for health-related communication than those with shorter travel times Conclusions Populations with serious health needs and those facing significant barriers in accessing health care in traditional settings turn to the Internet for health information. PMID:16704514
Federated Space-Time Query for Earth Science Data Using OpenSearch Conventions

NASA Technical Reports Server (NTRS)

Lynnes, Chris; Beaumont, Bruce; Duerr, Ruth; Hua, Hook

2009-01-01

This slide presentation reviews a Space-time query system that has been developed to assist the user in finding Earth science data that fulfills the researchers needs. It reviews the reasons why finding Earth science data can be so difficult, and explains the workings of the Space-Time Query with OpenSearch and how this system can assist researchers in finding the required data, It also reviews the developments with client server systems.
Using Internet search engines to estimate word frequency.

PubMed

Blair, Irene V; Urland, Geoffrey R; Ma, Jennifer E

2002-05-01

The present research investigated Internet search engines as a rapid, cost-effective alternative for estimating word frequencies. Frequency estimates for 382 words were obtained and compared across four methods: (1) Internet search engines, (2) the Kucera and Francis (1967) analysis of a traditional linguistic corpus, (3) the CELEX English linguistic database (Baayen, Piepenbrock, & Gulikers, 1995), and (4) participant ratings of familiarity. The results showed that Internet search engines produced frequency estimates that were highly consistent with those reported by Kucera and Francis and those calculated from CELEX, highly consistent across search engines, and very reliable over a 6-month period of time. Additional results suggested that Internet search engines are an excellent option when traditional word frequency analyses do not contain the necessary data (e.g., estimates for forenames and slang). In contrast, participants' familiarity judgments did not correspond well with the more objective estimates of word frequency. Researchers are advised to use search engines with large databases (e.g., AltaVista) to ensure the greatest representativeness of the frequency estimates.
An end user evaluation of query formulation and results review tools in three medical meta-search engines.

PubMed

Leroy, Gondy; Xu, Jennifer; Chung, Wingyan; Eggers, Shauna; Chen, Hsinchun

2007-01-01

Retrieving sufficient relevant information online is difficult for many people because they use too few keywords to search and search engines do not provide many support tools. To further complicate the search, users often ignore support tools when available. Our goal is to evaluate in a realistic setting when users use support tools and how they perceive these tools. We compared three medical search engines with support tools that require more or less effort from users to form a query and evaluate results. We carried out an end user study with 23 users who were asked to find information, i.e., subtopics and supporting abstracts, for a given theme. We used a balanced within-subjects design and report on the effectiveness, efficiency and usability of the support tools from the end user perspective. We found significant differences in efficiency but did not find significant differences in effectiveness between the three search engines. Dynamic user support tools requiring less effort led to higher efficiency. Fewer searches were needed and more documents were found per search when both query reformulation and result review tools dynamically adjust to the user query. The query reformulation tool that provided a long list of keywords, dynamically adjusted to the user query, was used most often and led to more subtopics. As hypothesized, the dynamic result review tools were used more often and led to more subtopics than static ones. These results were corroborated by the usability questionnaires, which showed that support tools that dynamically optimize output were preferred.
Advances in nowcasting influenza-like illness rates using search query logs

NASA Astrophysics Data System (ADS)

Lampos, Vasileios; Miller, Andrew C.; Crossan, Steve; Stefansen, Christian

2015-08-01

User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.
Advances in nowcasting influenza-like illness rates using search query logs.

PubMed

Lampos, Vasileios; Miller, Andrew C; Crossan, Steve; Stefansen, Christian

2015-08-03

User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.
Short-term Internet search using makes people rely on search engines when facing unknown issues.

PubMed

Wang, Yifan; Wu, Lingdan; Luo, Liang; Zhang, Yifen; Dong, Guangheng

2017-01-01

The Internet search engines, which have powerful search/sort functions and ease of use features, have become an indispensable tool for many individuals. The current study is to test whether the short-term Internet search training can make people more dependent on it. Thirty-one subjects out of forty subjects completed the search training study which included a pre-test, a six-day's training of Internet search, and a post-test. During the pre- and post- tests, subjects were asked to search online the answers to 40 unusual questions, remember the answers and recall them in the scanner. Un-learned questions were randomly presented at the recalling stage in order to elicited search impulse. Comparing to the pre-test, subjects in the post-test reported higher impulse to use search engines to answer un-learned questions. Consistently, subjects showed higher brain activations in dorsolateral prefrontal cortex and anterior cingulate cortex in the post-test than in the pre-test. In addition, there were significant positive correlations self-reported search impulse and brain responses in the frontal areas. The results suggest that a simple six-day's Internet search training can make people dependent on the search tools when facing unknown issues. People are easily dependent on the Internet search engines.
Short-term Internet search using makes people rely on search engines when facing unknown issues

PubMed Central

Wang, Yifan; Wu, Lingdan; Luo, Liang; Zhang, Yifen

2017-01-01

The Internet search engines, which have powerful search/sort functions and ease of use features, have become an indispensable tool for many individuals. The current study is to test whether the short-term Internet search training can make people more dependent on it. Thirty-one subjects out of forty subjects completed the search training study which included a pre-test, a six-day’s training of Internet search, and a post-test. During the pre- and post- tests, subjects were asked to search online the answers to 40 unusual questions, remember the answers and recall them in the scanner. Un-learned questions were randomly presented at the recalling stage in order to elicited search impulse. Comparing to the pre-test, subjects in the post-test reported higher impulse to use search engines to answer un-learned questions. Consistently, subjects showed higher brain activations in dorsolateral prefrontal cortex and anterior cingulate cortex in the post-test than in the pre-test. In addition, there were significant positive correlations self-reported search impulse and brain responses in the frontal areas. The results suggest that a simple six-day’s Internet search training can make people dependent on the search tools when facing unknown issues. People are easily dependent on the Internet search engines. PMID:28441408
Search query data to monitor interest in behavior change: application for public health.

PubMed

Carr, Lucas J; Dunsiger, Shira I

2012-01-01

There is a need for effective interventions and policies that target the leading preventable causes of death in the U.S. (e.g., smoking, overweight/obesity, physical inactivity). Such efforts could be aided by the use of publicly available, real-time search query data that illustrate times and locations of high and low public interest in behaviors related to preventable causes of death. This study explored patterns of search query activity for the terms 'weight', 'diet', 'fitness', and 'smoking' using Google Insights for Search. Search activity for 'weight', 'diet', 'fitness', and 'smoking' conducted within the United States via Google between January 4(th), 2004 (first date data was available) and November 28(th), 2011 (date of data download and analysis) were analyzed. Using a generalized linear model, we explored the effects of time (month) on mean relative search volume for all four terms. Models suggest a significant effect of month on mean search volume for all four terms. Search activity for all four terms was highest in January with observable declines throughout the remainder of the year. These findings demonstrate discernable temporal patterns of search activity for four areas of behavior change. These findings could be used to inform the timing, location and messaging of interventions, campaigns and policies targeting these behaviors.
Using search query surveillance to monitor tax avoidance and smoking cessation following the United States' 2009 "SCHIP" cigarette tax increase.

PubMed

Ayers, John W; Ribisl, Kurt; Brownstein, John S

2011-03-16

Smokers can use the web to continue or quit their habit. Online vendors sell reduced or tax-free cigarettes lowering smoking costs, while health advocates use the web to promote cessation. We examined how smokers' tax avoidance and smoking cessation Internet search queries were motivated by the United States' (US) 2009 State Children's Health Insurance Program (SCHIP) federal cigarette excise tax increase and two other state specific tax increases. Google keyword searches among residents in a taxed geography (US or US state) were compared to an untaxed geography (Canada) for two years around each tax increase. Search data were normalized to a relative search volume (RSV) scale, where the highest search proportion was labeled 100 with lesser proportions scaled by how they relatively compared to the highest proportion. Changes in RSV were estimated by comparing means during and after the tax increase to means before the tax increase, across taxed and untaxed geographies. The SCHIP tax was associated with an 11.8% (95% confidence interval [95%CI], 5.7 to 17.9; p<.001) immediate increase in cessation searches; however, searches quickly abated and approximated differences from pre-tax levels in Canada during the months after the tax. Tax avoidance searches increased 27.9% (95%CI, 15.9 to 39.9; p<.001) and 5.3% (95%CI, 3.6 to 7.1; p<.001) during and in the months after the tax compared to Canada, respectively, suggesting avoidance is the more pronounced and durable response. Trends were similar for state-specific tax increases but suggest strong interactive processes across taxes. When the SCHIP tax followed Florida's tax, versus not, it promoted more cessation and avoidance searches. Efforts to combat tax avoidance and increase cessation may be enhanced by using interventions targeted and tailored to smokers' searches. Search query surveillance is a valuable real-time, free and public method, that may be generalized to other behavioral, biological, informational or
Exploration of Web Users' Search Interests through Automatic Subject Categorization of Query Terms.

ERIC Educational Resources Information Center

Pu, Hsiao-tieh; Yang, Chyan; Chuang, Shui-Lung

2001-01-01

Proposes a mechanism that carefully integrates human and machine efforts to explore Web users' search interests. The approach consists of a four-step process: extraction of core terms; construction of subject taxonomy; automatic subject categorization of query terms; and observation of users' search interests. Research findings are proved valuable…
Smart internet search engine through 6W

NASA Astrophysics Data System (ADS)

Goehler, Stephen; Cader, Masud; Szu, Harold

2006-04-01

Current Internet search engine technology is limited in its ability to display necessary relevant information to the user. Yahoo, Google and Microsoft use lookup tables or indexes which limits the ability of users to find their desired information. While these companies have improved their results over the years by enhancing their existing technology and algorithms with specialized heuristics such as PageRank, there is a need for a next generation smart search engine that can effectively interpret the relevance of user searches and provide the actual information requested. This paper explores whether a smarter Internet search engine can effectively fulfill a user's needs through the use of 6W representations.
GOOSE: semantic search on internet connected sensors

NASA Astrophysics Data System (ADS)

Schutte, Klamer; Bomhof, Freek; Burghouts, Gertjan; van Diggelen, Jurriaan; Hiemstra, Peter; van't Hof, Jaap; Kraaij, Wessel; Pasman, Huib; Smith, Arthur; Versloot, Corne; de Wit, Joost

2013-05-01

More and more sensors are getting Internet connected. Examples are cameras on cell phones, CCTV cameras for traffic control as well as dedicated security and defense sensor systems. Due to the steadily increasing data volume, human exploitation of all this sensor data is impossible for effective mission execution. Smart access to all sensor data acts as enabler for questions such as "Is there a person behind this building" or "Alert me when a vehicle approaches". The GOOSE concept has the ambition to provide the capability to search semantically for any relevant information within "all" (including imaging) sensor streams in the entire Internet of sensors. This is similar to the capability provided by presently available Internet search engines which enable the retrieval of information on "all" web pages on the Internet. In line with current Internet search engines any indexing services shall be utilized cross-domain. The two main challenge for GOOSE is the Semantic Gap and Scalability. The GOOSE architecture consists of five elements: (1) an online extraction of primitives on each sensor stream; (2) an indexing and search mechanism for these primitives; (3) a ontology based semantic matching module; (4) a top-down hypothesis verification mechanism and (5) a controlling man-machine interface. This paper reports on the initial GOOSE demonstrator, which consists of the MES multimedia analysis platform and the CORTEX action recognition module. It also provides an outlook into future GOOSE development.
Visual Exploratory Search of Relationship Graphs on Smartphones

PubMed Central

Ouyang, Jianquan; Zheng, Hao; Kong, Fanbin; Liu, Tianming

2013-01-01

This paper presents a novel framework for Visual Exploratory Search of Relationship Graphs on Smartphones (VESRGS) that is composed of three major components: inference and representation of semantic relationship graphs on the Web via meta-search, visual exploratory search of relationship graphs through both querying and browsing strategies, and human-computer interactions via the multi-touch interface and mobile Internet on smartphones. In comparison with traditional lookup search methodologies, the proposed VESRGS system is characterized with the following perceived advantages. 1) It infers rich semantic relationships between the querying keywords and other related concepts from large-scale meta-search results from Google, Yahoo! and Bing search engines, and represents semantic relationships via graphs; 2) the exploratory search approach empowers users to naturally and effectively explore, adventure and discover knowledge in a rich information world of interlinked relationship graphs in a personalized fashion; 3) it effectively takes the advantages of smartphones’ user-friendly interfaces and ubiquitous Internet connection and portability. Our extensive experimental results have demonstrated that the VESRGS framework can significantly improve the users’ capability of seeking the most relevant relationship information to their own specific needs. We envision that the VESRGS framework can be a starting point for future exploration of novel, effective search strategies in the mobile Internet era. PMID:24223936
Search Query Data to Monitor Interest in Behavior Change: Application for Public Health

PubMed Central

Carr, Lucas J.; Dunsiger, Shira I.

2012-01-01

There is a need for effective interventions and policies that target the leading preventable causes of death in the U.S. (e.g., smoking, overweight/obesity, physical inactivity). Such efforts could be aided by the use of publicly available, real-time search query data that illustrate times and locations of high and low public interest in behaviors related to preventable causes of death. Objectives This study explored patterns of search query activity for the terms ‘weight’, ‘diet’, ‘fitness’, and ‘smoking’ using Google Insights for Search. Methods Search activity for ‘weight’, ‘diet’, ‘fitness’, and ‘smoking’ conducted within the United States via Google between January 4th, 2004 (first date data was available) and November 28th, 2011 (date of data download and analysis) were analyzed. Using a generalized linear model, we explored the effects of time (month) on mean relative search volume for all four terms. Results Models suggest a significant effect of month on mean search volume for all four terms. Search activity for all four terms was highest in January with observable declines throughout the remainder of the year. Conclusions These findings demonstrate discernable temporal patterns of search activity for four areas of behavior change. These findings could be used to inform the timing, location and messaging of interventions, campaigns and policies targeting these behaviors. PMID:23110198
The effective use of search engines on the Internet.

PubMed

Younger, P

This article explains how nurses can get the most out of researching information on the internet using the search engine Google. It also explores some of the other types of search engines that are available. Internet users are shown how to find text, images and reports and search within sites. Copyright issues are also discussed.
Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal.

PubMed

Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman

2014-07-04

The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for

Using Search Query Surveillance to Monitor Tax Avoidance and Smoking Cessation following the United States' 2009 “SCHIP” Cigarette Tax Increase

PubMed Central

Ayers, John W.; Ribisl, Kurt; Brownstein, John S.

2011-01-01

Smokers can use the web to continue or quit their habit. Online vendors sell reduced or tax-free cigarettes lowering smoking costs, while health advocates use the web to promote cessation. We examined how smokers' tax avoidance and smoking cessation Internet search queries were motivated by the United States' (US) 2009 State Children's Health Insurance Program (SCHIP) federal cigarette excise tax increase and two other state specific tax increases. Google keyword searches among residents in a taxed geography (US or US state) were compared to an untaxed geography (Canada) for two years around each tax increase. Search data were normalized to a relative search volume (RSV) scale, where the highest search proportion was labeled 100 with lesser proportions scaled by how they relatively compared to the highest proportion. Changes in RSV were estimated by comparing means during and after the tax increase to means before the tax increase, across taxed and untaxed geographies. The SCHIP tax was associated with an 11.8% (95% confidence interval [95%CI], 5.7 to 17.9; p<.001) immediate increase in cessation searches; however, searches quickly abated and approximated differences from pre-tax levels in Canada during the months after the tax. Tax avoidance searches increased 27.9% (95%CI, 15.9 to 39.9; p<.001) and 5.3% (95%CI, 3.6 to 7.1; p<.001) during and in the months after the tax compared to Canada, respectively, suggesting avoidance is the more pronounced and durable response. Trends were similar for state-specific tax increases but suggest strong interactive processes across taxes. When the SCHIP tax followed Florida's tax, versus not, it promoted more cessation and avoidance searches. Efforts to combat tax avoidance and increase cessation may be enhanced by using interventions targeted and tailored to smokers' searches. Search query surveillance is a valuable real-time, free and public method, that may be generalized to other behavioral, biological, informational or
An Examination of Natural Language as a Query Formation Tool for Retrieving Information on E-Health from Pub Med.

ERIC Educational Resources Information Center

Peterson, Gabriel M.; Su, Kuichun; Ries, James E.; Sievert, Mary Ellen C.

2002-01-01

Discussion of Internet use for information searches on health-related topics focuses on a study that examined complexity and variability of natural language in using search terms that express the concept of electronic health (e-health). Highlights include precision of retrieved information; shift in terminology; and queries using the Pub Med…
Querying Event Sequences by Exact Match or Similarity Search: Design and Empirical Evaluation

PubMed Central

Wongsuphasawat, Krist; Plaisant, Catherine; Taieb-Maimon, Meirav; Shneiderman, Ben

2012-01-01

Specifying event sequence queries is challenging even for skilled computer professionals familiar with SQL. Most graphical user interfaces for database search use an exact match approach, which is often effective, but near misses may also be of interest. We describe a new similarity search interface, in which users specify a query by simply placing events on a blank timeline and retrieve a similarity-ranked list of results. Behind this user interface is a new similarity measure for event sequences which the users can customize by four decision criteria, enabling them to adjust the impact of missing, extra, or swapped events or the impact of time shifts. We describe a use case with Electronic Health Records based on our ongoing collaboration with hospital physicians. A controlled experiment with 18 participants compared exact match and similarity search interfaces. We report on the advantages and disadvantages of each interface and suggest a hybrid interface combining the best of both. PMID:22379286
Analysis of queries sent to PubMed at the point of care: Observation of search behaviour in a medical teaching hospital

PubMed Central

Hoogendam, Arjen; Stalenhoef, Anton FH; Robbé, Pieter F de Vries; Overbeke, A John PM

2008-01-01

Background The use of PubMed to answer daily medical care questions is limited because it is challenging to retrieve a small set of relevant articles and time is restricted. Knowing what aspects of queries are likely to retrieve relevant articles can increase the effectiveness of PubMed searches. The objectives of our study were to identify queries that are likely to retrieve relevant articles by relating PubMed search techniques and tools to the number of articles retrieved and the selection of articles for further reading. Methods This was a prospective observational study of queries regarding patient-related problems sent to PubMed by residents and internists in internal medicine working in an Academic Medical Centre. We analyzed queries, search results, query tools (Mesh, Limits, wildcards, operators), selection of abstract and full-text for further reading, using a portal that mimics PubMed. Results PubMed was used to solve 1121 patient-related problems, resulting in 3205 distinct queries. Abstracts were viewed in 999 (31%) of these queries, and in 126 (39%) of 321 queries using query tools. The average term count per query was 2.5. Abstracts were selected in more than 40% of queries using four or five terms, increasing to 63% if the use of four or five terms yielded 2–161 articles. Conclusion Queries sent to PubMed by physicians at our hospital during daily medical care contain fewer than three terms. Queries using four to five terms, retrieving less than 161 article titles, are most likely to result in abstract viewing. PubMed search tools are used infrequently by our population and are less effective than the use of four or five terms. Methods to facilitate the formulation of precise queries, using more relevant terms, should be the focus of education and research. PMID:18816391
Can Google Trends search queries contribute to risk diversification?

PubMed

Kristoufek, Ladislav

2013-01-01

Portfolio diversification and active risk management are essential parts of financial analysis which became even more crucial (and questioned) during and after the years of the Global Financial Crisis. We propose a novel approach to portfolio diversification using the information of searched items on Google Trends. The diversification is based on an idea that popularity of a stock measured by search queries is correlated with the stock riskiness. We penalize the popular stocks by assigning them lower portfolio weights and we bring forward the less popular, or peripheral, stocks to decrease the total riskiness of the portfolio. Our results indicate that such strategy dominates both the benchmark index and the uniformly weighted portfolio both in-sample and out-of-sample.
Can Google Trends search queries contribute to risk diversification?

PubMed Central

Kristoufek, Ladislav

2013-01-01

Portfolio diversification and active risk management are essential parts of financial analysis which became even more crucial (and questioned) during and after the years of the Global Financial Crisis. We propose a novel approach to portfolio diversification using the information of searched items on Google Trends. The diversification is based on an idea that popularity of a stock measured by search queries is correlated with the stock riskiness. We penalize the popular stocks by assigning them lower portfolio weights and we bring forward the less popular, or peripheral, stocks to decrease the total riskiness of the portfolio. Our results indicate that such strategy dominates both the benchmark index and the uniformly weighted portfolio both in-sample and out-of-sample. PMID:24048448
Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal

PubMed Central

Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen

2014-01-01

Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for
Personalized query suggestion based on user behavior

NASA Astrophysics Data System (ADS)

Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui

Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.
Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer Health Information Portal

PubMed Central

Jadhav, Ashutosh; Sheth, Amit; Pathak, Jyotishman

2014-01-01

Since the early 2000’s, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users “information need” and how do they formulate search queries (“expression of information need”). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD related search queries from MayoClinic.com. Using UMLS MetaMap and UMLS semantic types/concepts, we developed a rule-based approach to categorize the queries into 14 health categories. We analyzed structural properties, types (keyword-based/Wh-questions/Yes-No questions) and linguistic structure of the queries. Our results show that the most searched health categories are ‘Diseases/Conditions’, ‘Vital-Sings’, ‘Symptoms’ and ‘Living-with’. CVD queries are longer and are predominantly keyword-based. This study extends our knowledge about online health information searching and provides useful insights for Web search engines and health websites. PMID:25954380
Development and Validation of a Self-reported Questionnaire for Measuring Internet Search Dependence

PubMed Central

Wang, Yifan; Wu, Lingdan; Zhou, Hongli; Xu, Jiaojing; Dong, Guangheng

2016-01-01

Internet search has become the most common way that people deal with issues and problems in everyday life. The wide use of Internet search has largely changed the way people search for and store information. There is a growing interest in the impact of Internet search on users’ affect, cognition, and behavior. Thus, it is essential to develop a tool to measure the changes in psychological characteristics as a result of long-term use of Internet search. The aim of this study is to develop a Questionnaire on Internet Search Dependence (QISD) and test its reliability and validity. We first proposed a preliminary structure and items of the QISD based on literature review, supplemental investigations, and interviews. And then, we assessed the psychometric properties and explored the factor structure of the initial version via exploratory factor analysis (EFA). The EFA results indicated that four dimensions of the QISD were very reliable, i.e., habitual use of Internet search, withdrawal reaction, Internet search trust, and external storage under Internet search. Finally, we tested the factor solution obtained from EFA through confirmatory factor analysis (CFA). The results of CFA confirmed that the four dimensions model fits the data well. In all, this study suggests that the 12-item QISD is of high reliability and validity and can serve as a preliminary tool to measure the features of Internet search dependence. PMID:28066753
Development and Validation of a Self-reported Questionnaire for Measuring Internet Search Dependence.

PubMed

Wang, Yifan; Wu, Lingdan; Zhou, Hongli; Xu, Jiaojing; Dong, Guangheng

2016-01-01

Internet search has become the most common way that people deal with issues and problems in everyday life. The wide use of Internet search has largely changed the way people search for and store information. There is a growing interest in the impact of Internet search on users' affect, cognition, and behavior. Thus, it is essential to develop a tool to measure the changes in psychological characteristics as a result of long-term use of Internet search. The aim of this study is to develop a Questionnaire on Internet Search Dependence (QISD) and test its reliability and validity. We first proposed a preliminary structure and items of the QISD based on literature review, supplemental investigations, and interviews. And then, we assessed the psychometric properties and explored the factor structure of the initial version via exploratory factor analysis (EFA). The EFA results indicated that four dimensions of the QISD were very reliable, i.e., habitual use of Internet search, withdrawal reaction, Internet search trust, and external storage under Internet search. Finally, we tested the factor solution obtained from EFA through confirmatory factor analysis (CFA). The results of CFA confirmed that the four dimensions model fits the data well. In all, this study suggests that the 12-item QISD is of high reliability and validity and can serve as a preliminary tool to measure the features of Internet search dependence.
Know your market: use of online query tools to quantify trends in patient information-seeking behavior for varicose vein treatment.

PubMed

Harsha, Asheesh K; Schmitt, J Eric; Stavropoulos, S William

2014-01-01

To analyze Internet search data to characterize the temporal and geographic interest of Internet users in the United States in varicose vein treatment. From January 1, 2004, to September 1, 2012, the Google Trends tool was used to analyze query data for "varicose vein treatment" to identify individuals seeking treatment information for varicose veins. The term "varicose vein treatment" returned a search volume index (SVI), representing the search frequency relative to the total search volume during a specific time interval and region. Linear regression analysis and Kruskal-Wallis one-way analysis of variance were employed to characterize search results. Search traffic for varicose vein treatment increased by 520% over the 104-month study period. There was an annual mean increase of 28% (range, -18%-100%; standard deviation [SD], 35%), with a statistically significant linear increase in average yearly SVI over time (R(2) = 0.94, P < .0001). All years showed positive growth in mean SVI except for 2008 (18% decrease). There were statistically significant differences in SVI by month (Kruskal-Wallis, P < .0001) with significantly higher mean SVI compared with other months in May (190% increase; range, 26%-670%; SD, 15%) and June (209% increase; range, 35%-700%; SD, 20%). The southern United States showed significantly higher search traffic than all other regions (Tukey-Kramer, P < .00001). There have been significant increases in Internet search traffic related to varicose vein treatment in the past 8 years. Reflected in this trend is an annual peak in search traffic in the late spring months with an overall geographic bias toward southern states. Rigorous analysis of Internet search queries for medical procedures may prove useful to guide the efficient use of limited resources and marketing dollars. © 2013 The Society of Interventional Radiology Published by SIR All rights reserved.
SymDex: increasing the efficiency of chemical fingerprint similarity searches for comparing large chemical libraries by using query set indexing.

PubMed

Tai, David; Fang, Jianwen

2012-08-27

The large sizes of today's chemical databases require efficient algorithms to perform similarity searches. It can be very time consuming to compare two large chemical databases. This paper seeks to build upon existing research efforts by describing a novel strategy for accelerating existing search algorithms for comparing large chemical collections. The quest for efficiency has focused on developing better indexing algorithms by creating heuristics for searching individual chemical against a chemical library by detecting and eliminating needless similarity calculations. For comparing two chemical collections, these algorithms simply execute searches for each chemical in the query set sequentially. The strategy presented in this paper achieves a speedup upon these algorithms by indexing the set of all query chemicals so redundant calculations that arise in the case of sequential searches are eliminated. We implement this novel algorithm by developing a similarity search program called Symmetric inDexing or SymDex. SymDex shows over a 232% maximum speedup compared to the state-of-the-art single query search algorithm over real data for various fingerprint lengths. Considerable speedup is even seen for batch searches where query set sizes are relatively small compared to typical database sizes. To the best of our knowledge, SymDex is the first search algorithm designed specifically for comparing chemical libraries. It can be adapted to most, if not all, existing indexing algorithms and shows potential for accelerating future similarity search algorithms for comparing chemical databases.
SAM: String-based sequence search algorithm for mitochondrial DNA database queries

PubMed Central

Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther

2011-01-01

The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). PMID:21056022
Searching for American Indian Resources on the Internet.

ERIC Educational Resources Information Center

Pollack, Ira; Derby, Amy

This paper provides basic information on searching the Internet and lists World Wide Web sites containing resources for American Indian education. Comprehensive and topical Web directories, search engines, and meta-search engines are briefly described. Search strategies are discussed, and seven Web sites are listed that provide more advanced…
Privacy-Preserving Location-Based Query Using Location Indexes and Parallel Searching in Distributed Networks

PubMed Central

Liu, Lei; Zhao, Jing

2014-01-01

An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users. PMID:24790579
Privacy-preserving location-based query using location indexes and parallel searching in distributed networks.

PubMed

Zhong, Cheng; Liu, Lei; Zhao, Jing

2014-01-01

An efficient location-based query algorithm of protecting the privacy of the user in the distributed networks is given. This algorithm utilizes the location indexes of the users and multiple parallel threads to search and select quickly all the candidate anonymous sets with more users and their location information with more uniform distribution to accelerate the execution of the temporal-spatial anonymous operations, and it allows the users to configure their custom-made privacy-preserving location query requests. The simulated experiment results show that the proposed algorithm can offer simultaneously the location query services for more users and improve the performance of the anonymous server and satisfy the anonymous location requests of the users.
Computer use, internet access, and online health searching among Harlem adults.

PubMed

Cohall, Alwyn T; Nye, Andrea; Moon-Howard, Joyce; Kukafka, Rita; Dye, Bonnie; Vaughan, Roger D; Northridge, Mary E

2011-01-01

Computer use, Internet access, and online searching for health information were assessed toward enhancing Internet use for health promotion. Cross-sectional random digit dial landline phone survey. Eight zip codes that comprised Central Harlem/Hamilton Heights and East Harlem in New York City. Adults 18 years and older (N=646). Demographic characteristics, computer use, Internet access, and online searching for health information. Frequencies for categorical variables and means and standard deviations for continuous variables were calculated and compared with analogous findings reported in national surveys from similar time periods. Among Harlem adults, ever computer use and current Internet use were 77% and 52%, respectively. High-speed home Internet connections were somewhat lower for Harlem adults than for U.S. adults overall (43% vs. 68%). Current Internet users in Harlem were more likely to be younger, white vs. black or Hispanic, better educated, and in better self-reported health than non-current users (p<.01). Of those who reported searching online for health information, 74% sought information on medical problems and thought that information found on the Internet affected the way they eat (47%) or exercise (44%). Many Harlem adults currently use the Internet to search for health information. High-speed connections and culturally relevant materials may facilitate health information searching for underserved groups. Copyright © 2011 by American Journal of Health Promotion, Inc.
SPLICE: A program to assemble partial query solutions from three-dimensional database searches into novel ligands

NASA Astrophysics Data System (ADS)

Ho, Chris M. W.; Marshall, Garland R.

1993-12-01

SPLICE is a program that processes partial query solutions retrieved from 3D, structural databases to generate novel, aggregate ligands. It is designed to interface with the database searching program FOUNDATION, which retrieves fragments containing any combination of a user-specified minimum number of matching query elements. SPLICE eliminates aspects of structures that are physically incapable of binding within the active site. Then, a systematic rule-based procedure is performed upon the remaining fragments to ensure receptor complementarity. All modifications are automated and remain transparent to the user. Ligands are then assembled by linking components into composite structures through overlapping bonds. As a control experiment, FOUNDATION and SPLICE were used to reconstruct a know HIV-1 protease inhibitor after it had been fragmented, reoriented, and added to a sham database of fifty different small molecules. To illustrate the capabilities of this program, a 3D search query containing the pharmacophoric elements of an aspartic proteinase-inhibitor crystal complex was searched using FOUNDATION against a subset of the Cambridge Structural Database. One hundred thirty-one compounds were retrieved, each containing any combination of at least four query elements. Compounds were automatically screened and edited for receptor complementarity. Numerous combinations of fragments were discovered that could be linked to form novel structures, containing a greater number of pharmacophoric elements than any single retrieved fragment.
Google search behavior for status epilepticus.

PubMed

Brigo, Francesco; Trinka, Eugen

2015-08-01

Millions of people surf the Internet every day as a source of health-care information looking for materials about symptoms, diagnosis, treatments and their possible adverse effects, or diagnostic procedures. Google is the most popular search engine and is used by patients and physicians to search for online health-related information. This study aimed to evaluate changes in Google search behavior occurring in English-speaking countries over time for the term "status epilepticus" (SE). Using Google Trends, data on global search queries for the term SE between the 1st of January 2004 and 31st of December 2014 were analyzed. Search volume numbers over time (downloaded as CSV datasets) were analyzed by applying the "health" category filter. The research trends for the term SE remained fairly constant over time. The greatest search volume for the term SE was reported in the United States, followed by India, Australia, the United Kingdom, Canada, the Netherlands, Thailand, and Germany. Most terms associated with the search queries were related to SE definition, symptoms, subtypes, and treatment. The volume of searches for some queries (nonconvulsive, focal, and refractory SE; SE definition; SE guidelines; SE symptoms; SE management; SE treatment) was enormously increased over time (search popularity has exceeded a 5000% growth since 2004). Most people use search engines to look for the term SE to obtain information on its definition, subtypes, and management. The greatest search volume occurred not only in developed countries but also in developing countries where raising awareness about SE still remains a challenging task and where there is reduced public knowledge of epilepsy. Health information seeking (the extent to which people search for health information online) reflects the health-related information needs of Internet users for a specific disease. Google Trends shows that Internet users have a great demand for information concerning some aspects of SE

From health search to healthcare: explorations of intention and utilization via query logs and user surveys

PubMed Central

White, Ryen W; Horvitz, Eric

2014-01-01

Objective To better understand the relationship between online health-seeking behaviors and in-world healthcare utilization (HU) by studies of online search and access activities before and after queries that pursue medical professionals and facilities. Materials and methods We analyzed data collected from logs of online searches gathered from consenting users of a browser toolbar from Microsoft (N=9740). We employed a complementary survey (N=489) to seek a deeper understanding of information-gathering, reflection, and action on the pursuit of professional healthcare. Results We provide insights about HU through the survey, breaking out its findings by different respondent marginalizations as appropriate. Observations made from search logs may be explained by trends observed in our survey responses, even though the user populations differ. Discussion The results provide insights about how users decide if and when to utilize healthcare resources, and how online health information seeking transitions to in-world HU. The findings from both the survey and the logs reveal behavioral patterns and suggest a strong relationship between search behavior and HU. Although the diversity of our survey respondents is limited and we cannot be certain that users visited medical facilities, we demonstrate that it may be possible to infer HU from long-term search behavior by the apparent influence that health concerns and professional advice have on search activity. Conclusions Our findings highlight different phases of online activities around queries pursuing professional healthcare facilities and services. We also show that it may be possible to infer HU from logs without tracking people's physical location, based on the effect of HU on pre- and post-HU search behavior. This allows search providers and others to develop more robust models of interests and preferences by modeling utilization rather than simply the intention to utilize that is expressed in search queries. PMID
Search Engines: Gateway to a New ``Panopticon''?

NASA Astrophysics Data System (ADS)

Kosta, Eleni; Kalloniatis, Christos; Mitrou, Lilian; Kavakli, Evangelia

Nowadays, Internet users are depending on various search engines in order to be able to find requested information on the Web. Although most users feel that they are and remain anonymous when they place their search queries, reality proves otherwise. The increasing importance of search engines for the location of the desired information on the Internet usually leads to considerable inroads into the privacy of users. The scope of this paper is to study the main privacy issues with regard to search engines, such as the anonymisation of search logs and their retention period, and to examine the applicability of the European data protection legislation to non-EU search engine providers. Ixquick, a privacy-friendly meta search engine will be presented as an alternative to privacy intrusive existing practices of search engines.
Matching health information seekers' queries to medical terms

PubMed Central

2012-01-01

Background The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. Methods In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. Results According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. Conclusions Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its
DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data.

PubMed

Putri, Fadhilah Kurnia; Song, Giltae; Kwon, Joonho; Rao, Praveen

2017-09-25

One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query ( DISPAQ ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data.
Predicting hospital visits from geo-tagged Internet search logs.

PubMed

Agarwal, Vibhu; Han, Lichy; Madan, Isaac; Saluja, Shaurya; Shidham, Aaditya; Shah, Nigam H

2016-01-01

The steady rise in healthcare costs has deprived over 45 million Americans of healthcare services (1, 2) and has encouraged healthcare providers to look for opportunities to improve their operational efficiency. Prior studies have shown that evidence of healthcare seeking intent in Internet searches correlates well with healthcare resource utilization. Given the ubiquitous nature of mobile Internet search, we hypothesized that analyzing geo-tagged mobile search logs could enable us to machine-learn predictors of future patient visits. Using a de-identified dataset of geo-tagged mobile Internet search logs, we mined text and location patterns that are predictors of healthcare resource utilization and built statistical models that predict the probability of a user's future visit to a medical facility. Our efforts will enable the development of innovative methods for modeling and optimizing the use of healthcare resources-a crucial prerequisite for securing healthcare access for everyone in the days to come.
SeqWare Query Engine: storing and searching sequence data in the cloud.

PubMed

O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

2010-12-21

analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
SeqWare Query Engine: storing and searching sequence data in the cloud

PubMed Central

2010-01-01

interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981
Characterizing Internet Searchers of Smoking Cessation Information

PubMed Central

Graham, Amanda L

2006-01-01

Background The Internet is a viable channel to deliver evidence-based smoking cessation treatment that has the potential to make a large population impact on reducing smoking prevalence. There is high demand for smoking cessation information and support on the Internet. Approximately 7% (10.2 million) of adult American Internet users have searched for information on quitting smoking. Little is known about these individuals, their smoking status, what type of cessation services they are seeking on the Internet, or how frequently these searches for cessation information are conducted. Objective The primary goal of this study was to characterize individuals who search for smoking cessation information on the Internet to determine appropriate triage and treatment strategies. The secondary goal was to estimate the incidence of searches for cessation information using publicly available search engine data. Methods We recruited individuals who clicked on a link to a leading smoking cessation website (QuitNet) from within the results of a search engine query. Individuals were “intercepted” before seeing the QuitNet home page and were invited to participate in the study. Those accepting the invitation were routed to an online survey about demographics, smoking characteristics, preferences for specific cessation services, and Internet search patterns. To determine the generalizability of our sample, national datasets on search engine usage patterns, market share, and keyword rankings were examined. These datasets were then used to estimate the number of queries for smoking cessation information each year. Results During the 10-day study period, 2265 individuals were recruited and 29% (N = 655) responded. Of these, 59% were female and overall tended to be younger than the previously characterized general Internet population. Most (76%) respondents were current smokers; 17% had quit within the last 7 days, and 7% had quit more than 7 days ago. Slightly more than half of
A Search Strategy of Level-Based Flooding for the Internet of Things

PubMed Central

Qiu, Tie; Ding, Yanhong; Xia, Feng; Ma, Honglian

2012-01-01

This paper deals with the query problem in the Internet of Things (IoT). Flooding is an important query strategy. However, original flooding is prone to cause heavy network loads. To address this problem, we propose a variant of flooding, called Level-Based Flooding (LBF). With LBF, the whole network is divided into several levels according to the distances (i.e., hops) between the sensor nodes and the sink node. The sink node knows the level information of each node. Query packets are broadcast in the network according to the levels of nodes. Upon receiving a query packet, sensor nodes decide how to process it according to the percentage of neighbors that have processed it. When the target node receives the query packet, it sends its data back to the sink node via random walk. We show by extensive simulations that the performance of LBF in terms of cost and latency is much better than that of original flooding, and LBF can be used in IoT of different scales. PMID:23112594
Metadata Effectiveness in Internet Discovery: An Analysis of Digital Collection Metadata Elements and Internet Search Engine Keywords

ERIC Educational Resources Information Center

Yang, Le

2016-01-01

This study analyzed digital item metadata and keywords from Internet search engines to learn what metadata elements actually facilitate discovery of digital collections through Internet keyword searching and how significantly each metadata element affects the discovery of items in a digital repository. The study found that keywords from Internet…
A peer-to-peer music sharing system based on query-by-humming

NASA Astrophysics Data System (ADS)

Wang, Jianrong; Chang, Xinglong; Zhao, Zheng; Zhang, Yebin; Shi, Qingwei

2007-09-01

Today, the main traffic in peer-to-peer (P2P) network is still multimedia files including large numbers of music files. The study of Music Information Retrieval (MIR) brings out many encouraging achievements in music search area. Nevertheless, the research of music search based on MIR in P2P network is still insufficient. Query by Humming (QBH) is one MIR technology studied for years. In this paper, we present a server based P2P music sharing system which is based on QBH and integrated with a Hierarchical Index Structure (HIS) to enhance the relation between surface data and potential information. HIS automatically evolving depends on the music related items carried by each peer such as midi files, lyrics and so forth. Instead of adding large amount of redundancy, the system generates a bit of index for multiple search input which improves the traditional keyword-based text search mode largely. When network bandwidth, speed, etc. are no longer a bottleneck of internet serve, the accessibility and accuracy of information provided by internet are being more concerned by end users.
Internet Search Engines - Fluctuations in Document Accessibility.

ERIC Educational Resources Information Center

Mettrop, Wouter; Nieuwenhuysen, Paul

2001-01-01

Reports an empirical investigation of the consistency of retrieval through Internet search engines. Evaluates 13 engines: AltaVista, EuroFerret, Excite, HotBot, InfoSeek, Lycos, MSN, NorthernLight, Snap, WebCrawler, and three national Dutch engines: Ilse, Search.nl and Vindex. The focus is on a characteristic related to size: the degree of…
DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data †

PubMed Central

Putri, Fadhilah Kurnia; Song, Giltae; Rao, Praveen

2017-01-01

One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query (DISPAQ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation’s Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data. PMID:28946679
Postmarket Drug Surveillance Without Trial Costs: Discovery of Adverse Drug Reactions Through Large-Scale Analysis of Web Search Queries

PubMed Central

Gabrilovich, Evgeniy

2013-01-01

Background Postmarket drug safety surveillance largely depends on spontaneous reports by patients and health care providers; hence, less common adverse drug reactions—especially those caused by long-term exposure, multidrug treatments, or those specific to special populations—often elude discovery. Objective Here we propose a low cost, fully automated method for continuous monitoring of adverse drug reactions in single drugs and in combinations thereof, and demonstrate the discovery of heretofore-unknown ones. Methods We used aggregated search data of large populations of Internet users to extract information related to drugs and adverse reactions to them, and correlated these data over time. We further extended our method to identify adverse reactions to combinations of drugs. Results We validated our method by showing high correlations of our findings with known adverse drug reactions (ADRs). However, although acute early-onset drug reactions are more likely to be reported to regulatory agencies, we show that less acute later-onset ones are better captured in Web search queries. Conclusions Our method is advantageous in identifying previously unknown adverse drug reactions. These ADRs should be considered as candidates for further scrutiny by medical regulatory authorities, for example, through phase 4 trials. PMID:23778053
Evaluation of internet search trends of some common oral problems, 2004 to 2014.

PubMed

Harorli, O T; Harorli, H

2014-09-01

Internet search trend volumes can provide free, fast and pertinent information about peoples' online interests. No study has yet been conducted on internet search trends in dentistry. This study aims to investigate ten years' data on internet search volumes regarding some oral problems: "toothache", "tooth decay", "gum disease", "wisdom teeth" and "oral cancer". The study also aims to investigate the most common geographic search locations and to examine related searches. Worldwide intermet search trend data over a period of 532 weeks (4 January 2004 and 15 March 2014) retrieved from the Google Trends web site was interrogated for each search term to identify search trends, regional interests, and related searches. The search volumes for the terms "toothache "and "wisdom teeth" increased over the decade while "tooth decay", "gum disease", and "oral cancer" showed slight changes. Each term was most commonly searched in different counties: "toothache", Philippines; "tooth decay", Singapore; "Gum Disease", Ireland; "Wisdom Teeth", United States; and "Oral cancer", India. Related searches were mainly focused on symptoms and remedies of these problems. Regional and time-related variations in search volumes may provide dental professionals with readily- and freely-available pertinent information on populations' internet searches regarding dental complaints.
Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.

PubMed

De-Arteaga, Maria; Eggel, Ivan; Do, Bao; Rubin, Daniel; Kahn, Charles E; Müller, Henning

2015-08-01

Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the way in which physicians aim to access information. Medical image search is a much smaller domain but has gained much attention as it has different characteristics than search for text documents. While web search log files have been analysed many times to better understand user behaviour, the log files of hospital internal systems for search in a PACS/RIS (Picture Archival and Communication System, Radiology Information System) have rarely been analysed. Such a comparison between a hospital PACS/RIS search and a web system for searching images of the biomedical literature is the goal of this paper. Objectives are to identify similarities and differences in search behaviour of the two systems, which could then be used to optimize existing systems and build new search engines. Log files of the ARRS GoldMiner medical image search engine (freely accessible on the Internet) containing 222,005 queries, and log files of Stanford's internal PACS/RIS search called radTF containing 18,068 queries were analysed. Each query was preprocessed and all query terms were mapped to the RadLex (Radiology Lexicon) terminology, a comprehensive lexicon of radiology terms created and maintained by the Radiological Society of North America, so the semantic content in the queries and the links between terms could be analysed, and synonyms for the same concept could be detected. RadLex was mainly created for the use in radiology reports, to aid structured reporting and the preparation of educational material (Lanlotz, 2006) [1]. In standard medical vocabularies such as MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) specific terms of radiology are often
Approaches to Internet Searching: An Analysis of Student in Grades 2 to 12.

ERIC Educational Resources Information Center

Lien, Cynthia

2000-01-01

Examines Internet search approaches by 123 students, and analyzes search methodologies relative to search successes. Presents three findings: (1) student experience with the Internet is closely correlated with ability to explore alternative search methods; (2) student level; and (3) a collaborative work among students in a classroom setting may…
How to improve your PubMed/MEDLINE searches: 2. display settings, complex search queries and topic searching.

PubMed

Fatehi, Farhad; Gray, Leonard C; Wootton, Richard

2014-01-01

The way that PubMed results are displayed can be changed using the Display Settings drop-down menu in the result screen. There are three groups of options: Format, Items per page and Sort by, which allow a good deal of control. The results from several searches can be temporarily stored on the Clipboard. Records of interest can be selected on the results page using check boxes and can then be combined, for example to form a reference list. The Related Citations is a valuable feature of PubMed that can provide a set of similar articles when you have identified a record of interest among the results. You can easily search for RCTs or reviews using the appropriate filters or field tags. If you are interested in clinical articles, rather than basic science or health service research, then the Clinical Queries tool on the PubMed home page can be used to retrieve them.
Using digital surveillance to examine the impact of public figure pancreatic cancer announcements on media and search query outcomes.

PubMed

Noar, Seth M; Ribisl, Kurt M; Althouse, Benjamin M; Willoughby, Jessica Fitts; Ayers, John W

2013-12-01

Announcements of cancer diagnoses from public figures may stimulate cancer information seeking and media coverage about cancer. This study used digital surveillance to quantify the effects of pancreatic cancer public figure announcements on online cancer information seeking and cancer media coverage. We compiled a list of public figures (N = 25) who had been diagnosed with or had died from pancreatic cancer between 2006 and 2011. We specified interrupted time series models using data from Google Trends to examine search query shifts for pancreatic cancer and other cancers. Weekly media coverage archived on Google News were also analyzed. Most public figures' pancreatic cancer announcements corresponded with no appreciable change in pancreatic cancer search queries or media coverage. In contrast, Patrick Swayze's diagnosis was associated with a 285% (95% confidence interval [CI]: 212 to 360) increase in pancreatic cancer search queries, though it was only weakly associated with increases in pancreatic cancer media coverage. Steve Jobs's death was associated with a 197% (95% CI: 131 to 266) increase in pancreatic cancer queries and a 3517% (95% CI: 2882 to 4492) increase in pancreatic cancer media coverage. In general, a doubling in pancreatic cancer-specific media coverage corresponded with a 325% increase in pancreatic cancer queries. Digital surveillance is an important tool for future cancer control research and practice. The current application of these methods suggested that pancreatic cancer announcements (diagnosis or death) by particular public figures stimulated media coverage of and online information seeking for pancreatic cancer.
Cognitive issues in searching images with visual queries

NASA Astrophysics Data System (ADS)

Yu, ByungGu; Evens, Martha W.

1999-01-01

In this paper, we propose our image indexing technique and visual query processing technique. Our mental images are different from the actual retinal images and many things, such as personal interests, personal experiences, perceptual context, the characteristics of spatial objects, and so on, affect our spatial perception. These private differences are propagated into our mental images and so our visual queries become different from the real images that we want to find. This is a hard problem and few people have tried to work on it. In this paper, we survey the human mental imagery system, the human spatial perception, and discuss several kinds of visual queries. Also, we propose our own approach to visual query interpretation and processing.

Is Internet search better than structured instruction for web-based health education?

PubMed

Finkelstein, Joseph; Bedra, McKenzie

2013-01-01

Internet provides access to vast amounts of comprehensive information regarding any health-related subject. Patients increasingly use this information for health education using a search engine to identify education materials. An alternative approach of health education via Internet is based on utilizing a verified web site which provides structured interactive education guided by adult learning theories. Comparison of these two approaches in older patients was not performed systematically. The aim of this study was to compare the efficacy of a web-based computer-assisted education (CO-ED) system versus searching the Internet for learning about hypertension. Sixty hypertensive older adults (age 45+) were randomized into control or intervention groups. The control patients spent 30 to 40 minutes searching the Internet using a search engine for information about hypertension. The intervention patients spent 30 to 40 minutes using the CO-ED system, which provided computer-assisted instruction about major hypertension topics. Analysis of pre- and post- knowledge scores indicated a significant improvement among CO-ED users (14.6%) as opposed to Internet users (2%). Additionally, patients using the CO-ED program rated their learning experience more positively than those using the Internet.
Dermatological image search engines on the Internet: do they work?

PubMed

Cutrone, M; Grimalt, R

2007-02-01

Atlases on CD-ROM first substituted the use of paediatric dermatology atlases printed on paper. This permitted a faster search and a practical comparison of differential diagnoses. The third step in the evolution of clinical atlases was the onset of the online atlas. Many doctors now use the Internet image search engines to obtain clinical images directly. The aim of this study was to test the reliability of the image search engines compared to the online atlases. We tested seven Internet image search engines with three paediatric dermatology diseases. In general, the service offered by the search engines is good, and continues to be free of charge. The coincidence between what we searched for and what we found was generally excellent, and contained no advertisements. Most Internet search engines provided similar results but some were more user friendly than others. It is not necessary to repeat the same research with Picsearch, Lycos and MSN, as the response would be the same; there is a possibility that they might share software. Image search engines are a useful, free and precise method to obtain paediatric dermatology images for teaching purposes. There is still the matter of copyright to be resolved. What are the legal uses of these 'free' images? How do we define 'teaching purposes'? New watermark methods and encrypted electronic signatures might solve these problems and answer these questions.
Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet

PubMed Central

2009-01-01

Infodemiology can be defined as the science of distribution and determinants of information in an electronic medium, specifically the Internet, or in a population, with the ultimate aim to inform public health and public policy. Infodemiology data can be collected and analyzed in near real time. Examples for infodemiology applications include: the analysis of queries from Internet search engines to predict disease outbreaks (eg. influenza); monitoring peoples' status updates on microblogs such as Twitter for syndromic surveillance; detecting and quantifying disparities in health information availability; identifying and monitoring of public health relevant publications on the Internet (eg. anti-vaccination sites, but also news articles or expert-curated outbreak reports); automated tools to measure information diffusion and knowledge translation, and tracking the effectiveness of health marketing campaigns. Moreover, analyzing how people search and navigate the Internet for health-related information, as well as how they communicate and share this information, can provide valuable insights into health-related behavior of populations. Seven years after the infodemiology concept was first introduced, this paper revisits the emerging fields of infodemiology and infoveillance and proposes an expanded framework, introducing some basic metrics such as information prevalence, concept occurrence ratios, and information incidence. The framework distinguishes supply-based applications (analyzing what is being published on the Internet, eg. on Web sites, newsgroups, blogs, microblogs and social media) from demand-based methods (search and navigation behavior), and further distinguishes passive from active infoveillance methods. Infodemiology metrics follow population health relevant events or predict them. Thus, these metrics and methods are potentially useful for public health practice and research, and should be further developed and standardized. PMID:19329408
Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet.

PubMed

Eysenbach, Gunther

2009-03-27

Infodemiology can be defined as the science of distribution and determinants of information in an electronic medium, specifically the Internet, or in a population, with the ultimate aim to inform public health and public policy. Infodemiology data can be collected and analyzed in near real time. Examples for infodemiology applications include the analysis of queries from Internet search engines to predict disease outbreaks (eg. influenza), monitoring peoples' status updates on microblogs such as Twitter for syndromic surveillance, detecting and quantifying disparities in health information availability, identifying and monitoring of public health relevant publications on the Internet (eg. anti-vaccination sites, but also news articles or expert-curated outbreak reports), automated tools to measure information diffusion and knowledge translation, and tracking the effectiveness of health marketing campaigns. Moreover, analyzing how people search and navigate the Internet for health-related information, as well as how they communicate and share this information, can provide valuable insights into health-related behavior of populations. Seven years after the infodemiology concept was first introduced, this paper revisits the emerging fields of infodemiology and infoveillance and proposes an expanded framework, introducing some basic metrics such as information prevalence, concept occurrence ratios, and information incidence. The framework distinguishes supply-based applications (analyzing what is being published on the Internet, eg. on Web sites, newsgroups, blogs, microblogs and social media) from demand-based methods (search and navigation behavior), and further distinguishes passive from active infoveillance methods. Infodemiology metrics follow population health relevant events or predict them. Thus, these metrics and methods are potentially useful for public health practice and research, and should be further developed and standardized.
An assessment of the visibility of MeSH-indexed medical web catalogs through search engines.

PubMed Central

Zweigenbaum, P.; Darmoni, S. J.; Grabar, N.; Douyère, M.; Benichou, J.

2002-01-01

Manually indexed Internet health catalogs such as CliniWeb or CISMeF provide resources for retrieving high-quality health information. Users of these quality-controlled subject gateways are most often referred to them by general search engines such as Google, AltaVista, etc. This raises several questions, among which the following: what is the relative visibility of medical Internet catalogs through search engines? This study addresses this issue by measuring and comparing the visibility of six major, MeSH-indexed health catalogs through four different search engines (AltaVista, Google, Lycos, Northern Light) in two languages (English and French). Over half a million queries were sent to the search engines; for most of these search engines, according to our measures at the time the queries were sent, the most visible catalog for English MeSH terms was CliniWeb and the most visible one for French MeSH terms was CISMeF. PMID:12463965
An assessment of the visibility of MeSH-indexed medical web catalogs through search engines.

PubMed

Zweigenbaum, P; Darmoni, S J; Grabar, N; Douyère, M; Benichou, J

2002-01-01

Manually indexed Internet health catalogs such as CliniWeb or CISMeF provide resources for retrieving high-quality health information. Users of these quality-controlled subject gateways are most often referred to them by general search engines such as Google, AltaVista, etc. This raises several questions, among which the following: what is the relative visibility of medical Internet catalogs through search engines? This study addresses this issue by measuring and comparing the visibility of six major, MeSH-indexed health catalogs through four different search engines (AltaVista, Google, Lycos, Northern Light) in two languages (English and French). Over half a million queries were sent to the search engines; for most of these search engines, according to our measures at the time the queries were sent, the most visible catalog for English MeSH terms was CliniWeb and the most visible one for French MeSH terms was CISMeF.
Infodemiology of status epilepticus: A systematic validation of the Google Trends-based search queries.

PubMed

Bragazzi, Nicola Luigi; Bacigaluppi, Susanna; Robba, Chiara; Nardone, Raffaele; Trinka, Eugen; Brigo, Francesco

2016-02-01

People increasingly use Google looking for health-related information. We previously demonstrated that in English-speaking countries most people use this search engine to obtain information on status epilepticus (SE) definition, types/subtypes, and treatment. Now, we aimed at providing a quantitative analysis of SE-related web queries. This analysis represents an advancement, with respect to what was already previously discussed, in that the Google Trends (GT) algorithm has been further refined and correlational analyses have been carried out to validate the GT-based query volumes. Google Trends-based SE-related query volumes were well correlated with information concerning causes and pharmacological and nonpharmacological treatments. Google Trends can provide both researchers and clinicians with data on realities and contexts that are generally overlooked and underexplored by classic epidemiology. In this way, GT can foster new epidemiological studies in the field and can complement traditional epidemiological tools. Copyright © 2015 Elsevier Inc. All rights reserved.
Robust Requirements Tracing via Internet Search Technology: Improving an IV and V Technique. Phase 2

NASA Technical Reports Server (NTRS)

Hayes, Jane; Dekhtyar, Alex

2004-01-01

There are three major objectives to this phase of the work. (1) Improvement of Information Retrieval (IR) methods for Independent Verification and Validation (IV&V) requirements tracing. Information Retrieval methods are typically developed for very large (order of millions - tens of millions and more documents) document collections and therefore, most successfully used methods somewhat sacrifice precision and recall in order to achieve efficiency. At the same time typical IR systems treat all user queries as independent of each other and assume that relevance of documents to queries is subjective for each user. The IV&V requirements tracing problem has a much smaller data set to operate on, even for large software development projects; the set of queries is predetermined by the high-level specification document and individual requirements considered as query input to IR methods are not necessarily independent from each other. Namely, knowledge about the links for one requirement may be helpful in determining the links of another requirement. Finally, while the final decision on the exact form of the traceability matrix still belongs to the IV&V analyst, his/her decisions are much less arbitrary than those of an Internet search engine user. All this suggests that the information available to us in the framework of the IV&V tracing problem can be successfully leveraged to enhance standard IR techniques, which in turn would lead to increased recall and precision. We developed several new methods during Phase II; (2) IV&V requirements tracing IR toolkit. Based on the methods developed in Phase I and their improvements developed in Phase II, we built a toolkit of IR methods for IV&V requirements tracing. The toolkit has been integrated, at the data level, with SAIC's SuperTracePlus (STP) tool; (3) Toolkit testing. We tested the methods included in the IV&V requirements tracing IR toolkit on a number of projects.
Internet use by the public to search for health-related information.

PubMed

AlGhamdi, Khalid M; Moussa, Noura A

2012-06-01

The use of the Internet to search for health-related information (HRI) has become a common practice worldwide. Our literature review failed to find any evidence of previous studies on this topic from Saudi Arabia. To determine the public use of the Internet in Saudi Arabia to search for HRI and to evaluate patients' perceptions of the quality of the information available on the Internet compared to that provided by their health care providers. A self-administered questionnaire about Internet use to search for HRI was distributed randomly to male and female outpatients and visitors attending a public University Hospital in Riyadh, Saudi Arabia from January to May 2010. A Chi-squared test was used to assess the association between different categorical variables. Multiple logistic regression was used to relate the use of the Internet to search for HRI with various socio-demographic variables. The questionnaire response was 80.1%, with completion of 801 of the 1000 distributed questionnaires; 50% (400/801) of respondents were males. The mean age of respondents was 32±11 years. The majority of respondents used the Internet in general (87.8%), and 58.4% of them (363/622) used the Internet to search for HRI. The majority stated a doctor was their primary source of HRI (89.3%, 654/732). This practice was considered useful by 84.2%, and the main reason behind it was sheer curiosity (92.7%, 418/451). Other reasons included not getting enough information from their doctor (58.5%, 227/413) and not trusting the information given by their doctor (28.2%, 101/443). Forty-four percent (205/466) searched for HRI before coming to the clinic; 72.5% of those discussed the information with their doctors and 71.7% (119/166) of those who did so believed that this positively affected their relationship with their doctor. Searching the Internet for health information was observed more frequently among the 30-39 year age group (OR=2.0, 95% CI 1.1-3.7), females (OR=3.8, 95% CI 2
Tracking the rise in popularity of electronic nicotine delivery systems (electronic cigarettes) using search query surveillance.

PubMed

Ayers, John W; Ribisl, Kurt M; Brownstein, John S

2011-04-01

Public interest in electronic nicotine delivery systems (ENDS) is undocumented. By monitoring search queries, ENDS popularity and correlates of their popularity were assessed in Australia, Canada, the United Kingdom (UK), and the U.S. English-language Google searches conducted from January 2008 through September 2010 were compared to snus, nicotine replacement therapy (NRT), and Chantix® or Champix®. Searches for each week were scaled to the highest weekly search proportion (100), with lower values indicating the relative search proportion compared to the highest-proportion week (e.g., 50=50% of the highest observed proportion). Analyses were performed in 2010. From July 2008 through February 2010, ENDS searches increased in all nations studied except Australia, there an increase occurred more recently. By September 2010, ENDS searches were several-hundred-fold greater than searches for smoking alternatives in the UK and U.S., and were rivaling alternatives in Australia and Canada. Across nations, ENDS searches were highest in the U.S., followed by similar search intensity in Canada and the UK, with Australia having the fewest ENDS searches. Stronger tobacco control, created by clean indoor air laws, cigarette taxes, and anti-smoking populations, were associated with consistently higher levels of ENDS searches. The online popularity of ENDS has surpassed that of snus and NRTs, which have been on the market for far longer, and is quickly outpacing Chantix or Champix. In part, the association between ENDS's popularity and stronger tobacco control suggests ENDS are used to bypass, or quit in response to, smoking restrictions. Search query surveillance is a valuable, real-time, free, and public method to evaluate the diffusion of new health products. This method may be generalized to other behavioral, biological, informational, or psychological outcomes manifested on search engines. Copyright © 2011 American Journal of Preventive Medicine. Published by Elsevier Inc
Are cannabis prevalence estimates comparable across countries and regions? A cross-cultural validation using search engine query data.

PubMed

Steppan, Martin; Kraus, Ludwig; Piontek, Daniela; Siciliano, Valeria

2013-01-01

Prevalence estimation of cannabis use is usually based on self-report data. Although there is evidence on the reliability of this data source, its cross-cultural validity is still a major concern. External objective criteria are needed for this purpose. In this study, cannabis-related search engine query data are used as an external criterion. Data on cannabis use were taken from the 2007 European School Survey Project on Alcohol and Other Drugs (ESPAD). Provincial data came from three Italian nation-wide studies using the same methodology (2006-2008; ESPAD-Italia). Information on cannabis-related search engine query data was based on Google search volume indices (GSI). (1) Reliability analysis was conducted for GSI. (2) Latent measurement models of "true" cannabis prevalence were tested using perceived availability, web-based cannabis searches and self-reported prevalence as indicators. (3) Structure models were set up to test the influences of response tendencies and geographical position (latitude, longitude). In order to test the stability of the models, analyses were conducted on country level (Europe, US) and on provincial level in Italy. Cannabis-related GSI were found to be highly reliable and constant over time. The overall measurement model was highly significant in both data sets. On country level, no significant effects of response bias indicators and geographical position on perceived availability, web-based cannabis searches and self-reported prevalence were found. On provincial level, latitude had a significant positive effect on availability indicating that perceived availability of cannabis in northern Italy was higher than expected from the other indicators. Although GSI showed weaker associations with cannabis use than perceived availability, the findings underline the external validity and usefulness of search engine query data as external criteria. The findings suggest an acceptable relative comparability of national (provincial) prevalence
Internet Search and Krokodil in the Russian Federation: An Infoveillance Study

PubMed Central

2014-01-01

Background Krokodil is an informal term for a cheap injectable illicit drug domestically prepared from codeine-containing medication (CCM). The method of krokodil preparation may produce desomorphine as well as toxic reactants that cause extensive tissue necrosis. The first confirmed report of krokodil use in Russia took place in 2004. In 2012, reports of krokodil-related injection injuries began to appear beyond Russia in Western Europe and the United States. Objective This exploratory study had two main objectives: (1) to determine if Internet search patterns could detect regularities in behavioral responses to Russian CCM policy at the population level, and (2) to determine if complementary data sources could explain the regularities we observed. Methods First, we obtained krokodil-related search pattern data for each Russia subregion (oblast) between 2011 and 2012. Second, we analyzed several complementary data sources included krokodil-related court cases, and related search terms on both Google and Yandex to evaluate the characteristics of terms accompanying krokodil-related search queries. Results In the 6 months preceding CCM sales restrictions, 21 of Russia's 83 oblasts had search rates higher than the national average (mean) of 16.67 searches per 100,000 population for terms associated with krokodil. In the 6 months following restrictions, mean national searches dropped to 9.65 per 100,000. Further, the number of oblasts recording a higher than average search rate dropped from 30 to 16. Second, we found krokodil-related court appearances were moderately positively correlated (Spearman correlation=.506, P≤.001) with behaviors consistent with an interest in the production and use of krokodil across Russia. Finally, Google Trends and Google and Yandex related terms suggested consistent public interest in the production and use of krokodil as well as for CCM as analgesic medication during the date range covered by this study. Conclusions Illicit drug use
Internet search and krokodil in the Russian Federation: an infoveillance study.

PubMed

Zheluk, Andrey; Quinn, Casey; Meylakhs, Peter

2014-09-18

Krokodil is an informal term for a cheap injectable illicit drug domestically prepared from codeine-containing medication (CCM). The method of krokodil preparation may produce desomorphine as well as toxic reactants that cause extensive tissue necrosis. The first confirmed report of krokodil use in Russia took place in 2004. In 2012, reports of krokodil-related injection injuries began to appear beyond Russia in Western Europe and the United States. This exploratory study had two main objectives: (1) to determine if Internet search patterns could detect regularities in behavioral responses to Russian CCM policy at the population level, and (2) to determine if complementary data sources could explain the regularities we observed. First, we obtained krokodil-related search pattern data for each Russia subregion (oblast) between 2011 and 2012. Second, we analyzed several complementary data sources included krokodil-related court cases, and related search terms on both Google and Yandex to evaluate the characteristics of terms accompanying krokodil-related search queries. In the 6 months preceding CCM sales restrictions, 21 of Russia's 83 oblasts had search rates higher than the national average (mean) of 16.67 searches per 100,000 population for terms associated with krokodil. In the 6 months following restrictions, mean national searches dropped to 9.65 per 100,000. Further, the number of oblasts recording a higher than average search rate dropped from 30 to 16. Second, we found krokodil-related court appearances were moderately positively correlated (Spearman correlation=.506, P≤.001) with behaviors consistent with an interest in the production and use of krokodil across Russia. Finally, Google Trends and Google and Yandex related terms suggested consistent public interest in the production and use of krokodil as well as for CCM as analgesic medication during the date range covered by this study. Illicit drug use data are generally regarded as difficult to obtain
Agreement between Medline searches using the Medline-CD-Rom and Internet Pubmed, BioMedNet, Medscape and Gateway search-engines.

PubMed

Caro-Rojas, Rosa Angela; Eslava-Schmalbach, Javier H

2005-01-01

To compare the information obtained from the Medline database using Internet commercial search engines with that obtained from a compact disc (Medline-CD). An agreement study was carried out based on 101 clinical scenarios provided by specialists in internal medicine, pharmacy, gynaecology-obstetrics, surgery and paediatrics. 175 search strategies were employed using the connector AND plus text within quotation marks. The search was limited to 1991-1999. Internet search-engines were selected by common criteria. Identical search strategies were independently applied to and masked from Internet search engines, as well as the Medline-CD. 3,488 articles were obtained using 129 search strategies. Agreement with the Medline-CD was 54% for PubMed, 57% for Gateway, 54% for Medscape and 65% for BioMedNet. The highest agreement rate for a given speciality (paediatrics) was 78.1% for BioMedNet, having greater -/- than +/+ agreement. Even though free access to Medline has encouraged the boom and growth of evidence-based medicine, these results must be considered within the context of which search engine was selected for doing the searches. The Internet search engines studied showed a poor agreement with the Medline-CD, the rate of agreement differing according to speciality, thus significantly affecting searches and their reproducibility. Software designed for conducting Medline database searches, including the Medline-CD, must be standardised and validated.
Do economic equality and generalized trust inhibit academic dishonesty? Evidence from state-level search-engine queries.

PubMed

Neville, Lukas

2012-04-01

What effect does economic inequality have on academic integrity? Using data from search-engine queries made between 2003 and 2011 on Google and state-level measures of income inequality and generalized trust, I found that academically dishonest searches (queries seeking term-paper mills and help with cheating) were more likely to come from states with higher income inequality and lower levels of generalized trust. These relations persisted even when controlling for contextual variables, such as average income and the number of colleges per capita. The relation between income inequality and academic dishonesty was fully mediated by generalized trust. When there is higher economic inequality, people are less likely to view one another as trustworthy. This lower generalized trust, in turn, is associated with a greater prevalence of academic dishonesty. These results might explain previous findings on the effectiveness of honor codes.
Frequency and seasonal variation of ophthalmology-related internet searches.

PubMed

Leffler, Christopher T; Davenport, Byrd; Chan, Dana

2010-06-01

To use internet search activity to reveal the intensity of public interest and seasonal variation in ophthalmology-related diseases, symptoms, and treatments. Time-series analysis of internet search data. Google trend data for ophthalmology terms for the United States, the United Kingdom, Canada, and Australia from 2004 through 2008 were studied. Mean population-weighted temperature and fraction of schools in session were estimated from databases, and relative potential sunlight intensity was calculated. Multivariable linear regression was used to predict search term frequency based on environmental variables. Relative to diabetes searches (100%), common US eye-related searches were: "glasses" (44%), "Lasik" (16%), "contact lenses" (12.4%), "pink eye" (9.5%), "glaucoma" (5.9%), "cataract" (4.1%), "dry eyes" (2.1%), "eye twitching" (1.9%), and "eye pain" (1.9%). Seasonal nature was high for "conjunctivitis" (r(2) = 0.37), "pink eye" (r(2) = 0.32), "eye floaters" (r2 = 0.26), and "stye" (r(2) = 0.19), moderate for "glaucoma" (r(2) = 0.09) and "eye twitching" (r(2) = 0.06), and low for "uveitis" (r(2) = 0.02) and "macular degeneration" (r(2) < 0.01). Heat was associated with "stye" and cold was associated with "pink eye," "conjunctivitis," and "glaucoma" (all p < 0.002). Sunlight intensity was associated with "dry eyes" and "eye floaters" (p < 0.01). School sessions were associated positively with "eye twitching" (p >= 0.001) and negatively with "eyeglasses." "Eye allergy," "itchy eyes," and "watery eyes" were highly seasonal (r(2) = 0.75-0.38) and associated with "pollen" searches. Internet ophthalmology searches relate (in decreasing order) to refractive correction, eye diseases, and eye symptoms. Search study reveals the seasonality and environmental associations of interest in health terms.
Cancer Internet search activity on a major search engine, United States 2001-2003.

PubMed

Cooper, Crystale Purvis; Mallon, Kenneth P; Leadbetter, Steven; Pollack, Lori A; Peipins, Lucy A

2005-07-01

To locate online health information, Internet users typically use a search engine, such as Yahoo! or Google. We studied Yahoo! search activity related to the 23 most common cancers in the United States. The objective was to test three potential correlates of Yahoo! cancer search activity--estimated cancer incidence, estimated cancer mortality, and the volume of cancer news coverage--and to study the periodicity of and peaks in Yahoo! cancer search activity. Yahoo! cancer search activity was obtained from a proprietary database called the Yahoo! Buzz Index. The American Cancer Society's estimates of cancer incidence and mortality were used. News reports associated with specific cancer types were identified using the LexisNexis "US News" database, which includes more than 400 national and regional newspapers and a variety of newswire services. The Yahoo! search activity associated with specific cancers correlated with their estimated incidence (Spearman rank correlation, rho = 0.50, P = .015), estimated mortality (rho = 0.66, P = .001), and volume of related news coverage (rho = 0.88, P < .001). Yahoo! cancer search activity tended to be higher on weekdays and during national cancer awareness months but lower during summer months; cancer news coverage also tended to follow these trends. Sharp increases in Yahoo! search activity scores from one day to the next appeared to be associated with increases in relevant news coverage. Media coverage appears to play a powerful role in prompting online searches for cancer information. Internet search activity offers an innovative tool for passive surveillance of health information-seeking behavior.
Cancer Internet Search Activity on a Major Search Engine, United States 2001-2003

PubMed Central

Cooper, Crystale Purvis; Mallon, Kenneth P; Leadbetter, Steven; Peipins, Lucy A

2005-01-01

Background To locate online health information, Internet users typically use a search engine, such as Yahoo! or Google. We studied Yahoo! search activity related to the 23 most common cancers in the United States. Objective The objective was to test three potential correlates of Yahoo! cancer search activity—estimated cancer incidence, estimated cancer mortality, and the volume of cancer news coverage—and to study the periodicity of and peaks in Yahoo! cancer search activity. Methods Yahoo! cancer search activity was obtained from a proprietary database called the Yahoo! Buzz Index. The American Cancer Society's estimates of cancer incidence and mortality were used. News reports associated with specific cancer types were identified using the LexisNexis “US News” database, which includes more than 400 national and regional newspapers and a variety of newswire services. Results The Yahoo! search activity associated with specific cancers correlated with their estimated incidence (Spearman rank correlation, ρ = 0.50, P = .015), estimated mortality (ρ = 0.66, P = .001), and volume of related news coverage (ρ = 0.88, P < .001). Yahoo! cancer search activity tended to be higher on weekdays and during national cancer awareness months but lower during summer months; cancer news coverage also tended to follow these trends. Sharp increases in Yahoo! search activity scores from one day to the next appeared to be associated with increases in relevant news coverage. Conclusions Media coverage appears to play a powerful role in prompting online searches for cancer information. Internet search activity offers an innovative tool for passive surveillance of health information–seeking behavior. PMID:15998627
Association of Internet search trends with suicide death in Taipei City, Taiwan, 2004-2009.

PubMed

Yang, Albert C; Tsai, Shi-Jen; Huang, Norden E; Peng, Chung-Kang

2011-07-01

Although Internet has become an important source for affected people seeking suicide information, the connection between Internet searches for suicide information and suicidal death remains largely unknown. This study aims to evaluate the association between suicide and Internet searches trends for 37 suicide-related terms representing major known risks of suicide. This study analyzes suicide death data in Taipei City, Taiwan and corresponding local Internet search trend data provided by Google Insights for Search during the period from January 2004 to December 2009. The investigation uses cross correlation analysis to estimate the temporal relationship between suicide and Internet search trends and multiple linear regression analysis to identify significant factors associated with suicide from a pool of search trend data that either coincides or precedes the suicide death. Results show that a set of suicide-related search terms, the trends of which either temporally coincided or preceded trends of suicide data, were associated with suicide death. These search factors varied among different suicide samples. Searches for "major depression" and "divorce" accounted for, at most, 30.2% of the variance in suicide data. When considering only leading suicide trends, searches for "divorce" and the pro-suicide term "complete guide of suicide," accounted for 22.7% of variance in suicide data. Appropriate filtering and detection of potentially harmful source in keyword-driven search results by search engine providers may be a reasonable strategy to reduce suicide deaths. Copyright © 2011 Elsevier B.V. All rights reserved.
Using internet search engines and library catalogs to locate toxicology information.

PubMed

Wukovitz, L D

2001-01-12

The increasing importance of the Internet demands that toxicologists become aquainted with its resources. To find information, researchers must be able to effectively use Internet search engines, directories, subject-oriented websites, and library catalogs. The article will explain these resources, explore their benefits and weaknesses, and identify skills that help the researcher to improve search results and critically evaluate sources for their relevancy, validity, accuracy, and timeliness.

Effects of consumer motives on search behavior using internet advertising.

PubMed

Yang, Kenneth C C

2004-08-01

Past studies on uses and gratifications theory suggested that consumer motives affect how they will use media and media contents. Recent advertising research has extended the theory to study the use of Internet advertising. The current study explores the effects of consumer motives on their search behavior using Internet advertising. The study employed a 2 by 2 between-subjects factorial experiment design. A total of 120 subjects were assigned to an experiment condition that contains an Internet advertisement varying by advertising appeals (i.e., rational vs. emotional) and product involvement levels (high vs. low). Consumer search behavior (measured by the depth, breadth, total amount of search), demographics, and motives were collected by post-experiment questionnaires. Because all three dependent variables measuring search behavior were conceptually related to each other, MANCOVA procedures were employed to examine the moderating effects of consumer motives on the dependent variables in four product involvement-advertising appeal conditions. Results indicated that main effects for product involvements and advertising appeals were statistically significant. Univariate ANOVA also showed that advertising appeals and product involvement levels influenced the total amount of search. Three-way interactions among advertising appeals, product involvement levels, and information motive were also statistically significant. Implications and future research directions are discussed.
Internet Power Searching: The Advanced Manual. 2nd Edition. Neal-Schuman NetGuide Series.

ERIC Educational Resources Information Center

Bradley, Phil

This handbook provides information on how Internet search engines and related software and utilities work and how to use them in order to improve search techniques. The book begins with an introduction to the Internet. Part 1 contains the following chapters that cover mining the Internet for information: "An Introduction to Search…
Internet Medline providers.

PubMed

Vine, D L; Coady, T R

1998-01-01

Each database in this review has features that will appeal to some users. Each provides a credible interface to information available within the Medline database. The major differences are pricing and interface design. In this context, features that cost more and might seem trivial to the occasional searcher may actually save time and money when used by the professional. Internet Grateful Med is free, but Ms. Coady and I agree the availability of only three ANDable search fields is a major functional limitation. PubMed is also free but much more powerful. The command line interface that permits very sophisticated searches requires a commitment that casual users will find intimidating. Ms. Coady did not believe the feedback currently provided during a search was sufficient for sustained professional use. Paper Chase and Knowledge Finder are mature, modestly priced Medline search services. Paper Chase provides a menu-driven interface that is very easy to use, yet permits the user to search virtually all of Medline's data fields. Knowledge Finder emphasizes the use of natural language queries but fully supports more traditional search strategies. The impact of the tradeoff between fuzzy and Boolean strategies offered by Knowledge Finder is unclear and beyond the scope of this review. Additional software must be downloaded to use all of Knowledge Finders' features. Other providers required no software beyond the basic Internet browser, and this requirement prevented Ms. Coady from evaluating Knowledge Finder. Ovid and Silver Platter offer well-designed interfaces that simplify the construction of complex queries. These are clearly services designed for professional users. While pricing eliminates these for casual use, it should be emphasized that Medline citation access is only a portion of the service provided by these high-end vendors. Finally, we should comment that each of the vendors and government-sponsored services provided prompt and useful feedback to e
Automatic query formulations in information retrieval.

PubMed

Salton, G; Buckley, C; Fox, E A

1983-07-01

Modern information retrieval systems are designed to supply relevant information in response to requests received from the user population. In most retrieval environments the search requests consist of keywords, or index terms, interrelated by appropriate Boolean operators. Since it is difficult for untrained users to generate effective Boolean search requests, trained search intermediaries are normally used to translate original statements of user need into useful Boolean search formulations. Methods are introduced in this study which reduce the role of the search intermediaries by making it possible to generate Boolean search formulations completely automatically from natural language statements provided by the system patrons. Frequency considerations are used automatically to generate appropriate term combinations as well as Boolean connectives relating the terms. Methods are covered to produce automatic query formulations both in a standard Boolean logic system, as well as in an extended Boolean system in which the strict interpretation of the connectives is relaxed. Experimental results are supplied to evaluate the effectiveness of the automatic query formulation process, and methods are described for applying the automatic query formulation process in practice.
SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory

NASA Astrophysics Data System (ADS)

Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.

2002-12-01

We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a
[Internet search for counseling offers for older adults suffering from visual impairment].

PubMed

Himmelsbach, I; Lipinski, J; Putzke, M

2016-11-01

Visual impairment is a relevant problem of aging. In many cases promising therapeutic options exist but patients are often left with visual deficits, which require a high degree of individualized counseling. This article analyzed which counseling offers can be found by patients and relatives using simple and routine searching via the internet. Analyses were performed using colloquial search terms in the search engine Google in order to find counseling options for elderly people with visual impairments available via the internet. With this strategy 189 offers for counseling were found, which showed very heterogeneous regional differences in distribution. The counseling options found in the internet commonly address topics such as therapeutic interventions or topics on visual aids corresponding to the professions offering rehabilitation most present in the internet, such as ophthalmologists and opticians. Regarding contents addressing psychosocial and help in daily tasks, self-help and support groups offer the most differentiated and broadest spectrum. Support offers for daily living tasks and psychosocial counseling from social providers were more difficult to find with these search terms despite a high presence in the internet. There are a large number of providers of counseling and consulting for older persons with visual impairment. In order to be found more easily by patients and to be recommended more often by ophthalmologists and general practitioners, the presence of providers in the internet must be improved, especially providers of daily living and psychosocial support offers.
GenoQuery: a new querying module for functional annotation in a genomic warehouse

PubMed Central

Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

2008-01-01

Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731
The History of the Internet Search Engine: Navigational Media and the Traffic Commodity

NASA Astrophysics Data System (ADS)

van Couvering, E.

This chapter traces the economic development of the search engine industry over time, beginning with the earliest Web search engines and ending with the domination of the market by Google, Yahoo! and MSN. Specifically, it focuses on the ways in which search engines are similar to and different from traditional media institutions, and how the relations between traditional and Internet media have changed over time. In addition to its historical overview, a core contribution of this chapter is the analysis of the industry using a media value chain based on audiences rather than on content, and the development of traffic as the core unit of exchange. It shows that traditional media companies failed when they attempted to create vertically integrated portals in the late 1990s, based on the idea of controlling Internet content, while search engines succeeded in creating huge "virtually integrated" networks based on control of Internet traffic rather than Internet content.
Revisiting the Rise of Electronic Nicotine Delivery Systems Using Search Query Surveillance

PubMed Central

Ayers, John W.; Althouse, Benjamin M.; Allem, Jon-Patrick; Leas, Eric C.; Dredze, Mark; Williams, Rebecca

2016-01-01

Introduction Public perceptions of electronic nicotine delivery systems (ENDS) remain poorly understood because surveys are too costly to regularly implement and when implemented there are large delays between data collection and dissemination. Search query surveillance has bridged some of these gaps. Herein, ENDS’ popularity in the U.S. is reassessed using Google searches. Methods ENDS searches originating in the U.S. from January 2009 through January 2015 were disaggregated by terms focused on e-cigarette (e.g., e-cig) versus vaping (e.g., vapers), their geolocation (e.g., state), the aggregate tobacco control measures corresponding to their geolocation (e.g., clean indoor air laws), and by terms that indicated the searcher’s potential interest (e.g., buy e-cigs likely indicates shopping); all analyzed in 2015. Results ENDS searches are increasing across the entire U.S., with 8,498,180 searches during 2014. At the same time, searches shifted from e-cigarette- to vaping-focused terms, especially in coastal states and states with more anti-smoking norms. For example, nationally, e-cigarette searches declined 9% (95% CI=1%, 16%) during 2014 compared with 2013, whereas vaping searches increased 136% (95% CI=97%, 186%), surpassing e-cigarette searches. More ENDS searches were related to shopping (e.g., vape shop) than health concerns (e.g., vaping risks) or cessation (e.g., quit smoking with e-cigs), with shopping searches nearly doubling during 2014. Conclusions ENDS popularity is rapidly growing and evolving, and monitoring searches has provided these timely insights. These findings may inform survey questionnaire development for follow-up investigation and immediately guide policy debates about how the public perceives ENDS’ health risks or cessation benefits. PMID:26876772
Abyss or Shelter? On the Relevance of Web Search Engines' Search Results When People Google for Suicide.

PubMed

Haim, Mario; Arendt, Florian; Scherr, Sebastian

2017-02-01

Despite evidence that suicide rates can increase after suicides are widely reported in the media, appropriate depictions of suicide in the media can help people to overcome suicidal crises and can thus elicit preventive effects. We argue on the level of individual media users that a similar ambivalence can be postulated for search results on online suicide-related search queries. Importantly, the filter bubble hypothesis (Pariser, 2011) states that search results are biased by algorithms based on a person's previous search behavior. In this study, we investigated whether suicide-related search queries, including either potentially suicide-preventive or -facilitative terms, influence subsequent search results. This might thus protect or harm suicidal Internet users. We utilized a 3 (search history: suicide-related harmful, suicide-related helpful, and suicide-unrelated) × 2 (reactive: clicking the top-most result link and no clicking) experimental design applying agent-based testing. While findings show no influences either of search histories or of reactivity on search results in a subsequent situation, the presentation of a helpline offer raises concerns about possible detrimental algorithmic decision-making: Algorithms "decided" whether or not to present a helpline, and this automated decision, then, followed the agent throughout the rest of the observation period. Implications for policy-making and search providers are discussed.
Using Search Engine Data as a Tool to Predict Syphilis.

PubMed

Young, Sean D; Torrone, Elizabeth A; Urata, John; Aral, Sevgi O

2018-07-01

Researchers have suggested that social media and online search data might be used to monitor and predict syphilis and other sexually transmitted diseases. Because people at risk for syphilis might seek sexual health and risk-related information on the internet, we investigated associations between internet state-level search query data (e.g., Google Trends) and reported weekly syphilis cases. We obtained weekly counts of reported primary and secondary syphilis for 50 states from 2012 to 2014 from the US Centers for Disease Control and Prevention. We collected weekly internet search query data regarding 25 risk-related keywords from 2012 to 2014 for 50 states using Google Trends. We joined 155 weeks of Google Trends data with 1-week lag to weekly syphilis data for a total of 7750 data points. Using the least absolute shrinkage and selection operator, we trained three linear mixed models on the first 10 weeks of each year. We validated models for 2012 and 2014 for the following 52 weeks and the 2014 model for the following 42 weeks. The models, consisting of different sets of keyword predictors for each year, accurately predicted 144 weeks of primary and secondary syphilis counts for each state, with an overall average R of 0.9 and overall average root mean squared error of 4.9. We used Google Trends search data from the prior week to predict cases of syphilis in the following weeks for each state. Further research could explore how search data could be integrated into public health monitoring systems.
Internet Searches About Therapies Do Not Impact Willingness to Accept Prescribed Therapy in Inflammatory Bowel Disease Patients.

PubMed

Feathers, Alexandra; Yen, Tommy; Yun, Laura; Strizich, Garrett; Swaminath, Arun

2016-04-01

A significant majority of patients with inflammatory bowel disease (IBD) search the Internet for information about their disease. While patients who search the Internet for disease or treatment information are believed to be more resistant to accepting medical therapy, no studies have tested this hypothesis. All IBD patients over a 3-month period across three gastroenterology practices were surveyed about their disease, treatments, websites visited, attitudes toward medications, and their willingness to accept prescribed therapies after disease-related Internet searches. Of 142 total patients, 91 % of respondents searched the Internet for IBD information. The vast majority (82 %) reported taking medication upon their doctor's recommendation and cited the desire to acquire additional information about their disease and prescribed therapies as their most important search motivator (77 %). Internet usage did not affect the willingness of 52 % of our cohort to accept prescribed medication. The majority of IBD patients who searched the Internet for disease and treatment-related information were not affected in their willingness to accept prescribed medical therapy.
Web-search trends shed light on the nature of lunacy: relationship between moon phases and epilepsy information-seeking behavior.

PubMed

Otte, Willem M; van Diessen, Eric; Bell, Gail S; Sander, Josemir W

2013-12-01

In old and modern times and across cultures, recurrent seizures have been attributed to the lunar phase. It is unclear whether this relationship should be classified as a myth or whether a true connection exists between moon phases and seizures. We analyzed the worldwide aggregated search queries related to epilepsy health-seeking behavior between 2005 and 2012. Epilepsy-related Internet searches increased in periods with a high moon illumination. The overall association was weak (r=0.11, 95% confidence interval: 0.07 to 0.14) but seems to be higher than most control search queries not related to epilepsy. Increased sleep deprivation during periods of full moon might explain this positive association and warrants further study into epilepsy-related health-seeking behavior on the Internet, the lunar phase, and its contribution to nocturnal luminance. © 2013.
Teen smoking cessation help via the Internet: a survey of search engines.

PubMed

Edwards, Christine C; Elliott, Sean P; Conway, Terry L; Woodruff, Susan I

2003-07-01

The objective of this study was to assess Web sites related to teen smoking cessation on the Internet. Seven Internet search engines were searched using the keywords teen quit smoking. The top 20 hits from each search engine were reviewed and categorized. The keywords teen quit smoking produced between 35 and 400,000 hits depending on the search engine. Of 140 potential hits, 62% were active, unique sites; 85% were listed by only one search engine; and 40% focused on cessation. Findings suggest that legitimate on-line smoking cessation help for teens is constrained by search engine choice and the amount of time teens spend looking through potential sites. Resource listings should be updated regularly. Smoking cessation Web sites need to be picked up on multiple search engine searches. Further evaluation of smoking cessation Web sites need to be conducted to identify the most effective help for teens.
Answers to Health Questions: Internet Search Results Versus Online Health Community Responses.

PubMed

Kanthawala, Shaheen; Vermeesch, Amber; Given, Barbara; Huh, Jina

2016-04-28

About 6 million people search for health information on the Internet each day in the United States. Both patients and caregivers search for information about prescribed courses of treatments, unanswered questions after a visit to their providers, or diet and exercise regimens. Past literature has indicated potential challenges around quality in health information available on the Internet. However, diverse information exists on the Internet-ranging from government-initiated webpages to personal blog pages. Yet we do not fully understand the strengths and weaknesses of different types of information available on the Internet. The objective of this research was to investigate the strengths and challenges of various types of health information available online and to suggest what information sources best fit various question types. We collected questions posted to and the responses they received from an online diabetes community and classified them according to Rothwell's classification of question types (fact, policy, or value questions). We selected 60 questions (20 each of fact, policy, and value) and the replies the questions received from the community. We then searched for responses to the same questions using a search engine and recorded the Community responses answered more questions than did search results overall. Search results were most effective in answering value questions and least effective in answering policy questions. Community responses answered questions across question types at an equivalent rate, but most answered policy questions and the least answered fact questions. Value questions were most answered by community responses, but some of these answers provided by the community were incorrect. Fact question search results were the most clinically valid. The Internet is a prevalent source of health information for people. The information quality people encounter online can have a large impact on them. We present what kinds of questions people ask
Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China

PubMed Central

Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao

2017-01-01

Objectives Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Design Ecological study. Setting and participants Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011–2014. Analyses were conducted at aggregate level and no confidential information was involved. Outcome measures A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. Results A high correlation between HFMD incidence and BDI (r=0.794, p<0.001) or temperature (r=0.657, p<0.001) was observed using both time series plot and correlation matrix. A linear effect of BDI (without lag) and non-linear effect of temperature (1 week lag) on HFMD incidence were found in a distributed lag non-linear model. Compared with the model based on surveillance data only, the ARIMAX model including BDI reached the best goodness-of-fit with an Akaike information criterion (AIC) value of −345.332, whereas the model including both BDI and temperature had the most accurate prediction in terms of the mean absolute percentage error (MAPE) of 101.745%. Conclusions An ARIMAX model incorporating search engine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of
Revisiting the Rise of Electronic Nicotine Delivery Systems Using Search Query Surveillance.

PubMed

Ayers, John W; Althouse, Benjamin M; Allem, Jon-Patrick; Leas, Eric C; Dredze, Mark; Williams, Rebecca S

2016-06-01

Public perceptions of electronic nicotine delivery systems (ENDS) remain poorly understood because surveys are too costly to regularly implement and, when implemented, there are long delays between data collection and dissemination. Search query surveillance has bridged some of these gaps. Herein, ENDS' popularity in the U.S. is reassessed using Google searches. ENDS searches originating in the U.S. from January 2009 through January 2015 were disaggregated by terms focused on e-cigarette (e.g., e-cig) versus vaping (e.g., vapers); their geolocation (e.g., state); the aggregate tobacco control measures corresponding to their geolocation (e.g., clean indoor air laws); and by terms that indicated the searcher's potential interest (e.g., buy e-cigs likely indicates shopping)-all analyzed in 2015. ENDS searches are rapidly increasing in the U.S., with 8,498,000 searches during 2014 alone. Increasingly, searches are shifting from e-cigarette- to vaping-focused terms, especially in coastal states and states where anti-smoking norms are stronger. For example, nationally, e-cigarette searches declined 9% (95% CI=1%, 16%) during 2014 compared with 2013, whereas vaping searches increased 136% (95% CI=97%, 186%), even surpassing e-cigarette searches. Additionally, the percentage of ENDS searches related to shopping (e.g., vape shop) nearly doubled in 2014, whereas searches related to health concerns (e.g., vaping risks) or cessation (e.g., quit smoking with e-cigs) were rare and declined in 2014. ENDS popularity is rapidly growing and evolving. These findings could inform survey questionnaire development for follow-up investigation and immediately guide policy debates about how the public perceives the health risks or cessation benefits of ENDS. Copyright © 2016 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
Quality of anaesthesia-related information accessed via Internet searches.

PubMed

Caron, S; Berton, J; Beydon, L

2007-08-01

We conducted a study to examine the quality and stability of information available from the Internet on four anaesthesia-related topics. In January 2006, we searched using four key words (porphyria, scleroderma, transfusion risk, and epidural analgesia risk) with five search engines (Google, HotBot, AltaVista, Excite, and Yahoo). We used a published scoring system (NetScoring) to evaluate the first 15 sites identified by each of these 20 searches. We also used a simple four-point scale to assess the first 100 sites in the Google search on one of our four topics ('epidural analgesia risk'). In November 2006, we conducted a second evaluation, using three search engines (Google, AltaVista, and Yahoo) with 14 synonyms for 'epidural analgesia risk'. The five search engines performed similarly. NetScoring scores were lower for transfusion risk (P < 0.001). One or more high-quality sites was identified consistently among the first 15 sites in each search. Quality scored using the simple scale correlated closely with medical content and design by NetScoring and with the number of references (P < 0.05). Synonyms of 'epidural analgesia risk' yielded similar results. The quality of accessed information improved somewhat over the 11 month period with Yahoo and AltaVista, but declined with Google. The Internet is a valuable tool for obtaining medical information, but the quality of websites varies between different topics. A simple rating scale may facilitate the quality scoring on individual websites. Differences in precise search terms used for a given topic did not appear to affect the quality of the information obtained.
Information Retrieval Using UMLS-based Structured Queries

PubMed Central

Fagan, Lawrence M.; Berrios, Daniel C.; Chan, Albert; Cucina, Russell; Datta, Anupam; Shah, Maulik; Surendran, Sujith

2001-01-01

During the last three years, we have developed and described components of ELBook, a semantically based information-retrieval system [1-4]. Using these components, domain experts can specify a query model, indexers can use the query model to index documents, and end-users can search these documents for instances of indexed queries.
Assessing the impact of the national smoking ban in indoor public places in china: evidence from quit smoking related online searches.

PubMed

Huang, Jidong; Zheng, Rong; Emery, Sherry

2013-01-01

Despite the tremendous economic and health costs imposed on China by tobacco use, China lacks a proactive and systematic tobacco control surveillance and evaluation system, hampering research progress on tobacco-focused surveillance and evaluation studies. This paper uses online search query analyses to investigate changes in online search behavior among Chinese Internet users in response to the adoption of the national indoor public place smoking ban. Baidu Index and Google Trends were used to examine the volume of search queries containing three key search terms "Smoking Ban(s)," "Quit Smoking," and "Electronic Cigarette(s)," along with the news coverage on the smoking ban, for the period 2009-2011. Our results show that the announcement and adoption of the indoor public place smoking ban in China generated significant increases in news coverage on smoking bans. There was a strong positive correlation between the media coverage of smoking bans and the volume of "Smoking Ban(s)" and "Quit Smoking" related search queries. The volume of search queries related to "Electronic Cigarette(s)" was also correlated with the smoking ban news coverage. To the extent it altered smoking-related online searches, our analyses suggest that the smoking ban had a significant effect, at least in the short run, on Chinese Internet users' smoking-related behaviors. This research introduces a novel analytic tool, which could serve as an alternative tobacco control evaluation and behavior surveillance tool in the absence of timely or comprehensive population surveillance system. This research also highlights the importance of a comprehensive approach to tobacco control in China.

Internet use by patients with psychiatric disorders in search for general and medical informations.

PubMed

Khazaal, Yasser; Chatton, Anne; Cochand, Sophie; Hoch, Aliosca; Khankarli, Mona B; Khan, Riaz; Zullino, Daniele Fabio

2008-12-01

Internet is commonly used by the general population, notably for health information-seeking. There has been little research into its use by patients treated for a psychiatric disorder. To evaluate the use of internet by patients with psychiatric disorders in searching for general and medical information. In 2007, 319 patients followed in a university hospital psychiatric out-patient clinic, completed a 28-items self-administered questionnaire. Two hundred patients surveyed were internet users. Most of them (68.5%) used internet in order to find health-related information. Only a small part of the patients knew and used criteria reflecting the quality of contents of the websites consulted. Knowledge of English and private Internet access were the factors significantly associated with the search of information on health on Internet. Internet is currently used by patients treated for psychiatric disorders, especially for medical seeking information.
Multidimensional indexing structure for use with linear optimization queries

NASA Technical Reports Server (NTRS)

Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)

2002-01-01

Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.
Forecasting influenza in Hong Kong with Google search queries and statistical model fusion

PubMed Central

Ramirez Ramirez, L. Leticia; Nezafati, Kusha; Zhang, Qingpeng; Tsui, Kwok-Leung

2017-01-01

Background The objective of this study is to investigate predictive utility of online social media and web search queries, particularly, Google search data, to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement (i.e., fickle media interest) and other artifacts of online social media data, in our approach we fuse multiple offline and online data sources. Methods Four individual models: generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), autoregressive integrated moving average (ARIMA), and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. To our knowledge, this is the first study that introduces deep learning methodology into surveillance of infectious diseases and investigates its predictive utility. Furthermore, to exploit the strength from each individual forecasting models, we use statistical model fusion, using Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. For each model, an adaptive approach is used to capture the recent relationship between ILI and covariates. Results DL with FNN appears to deliver the most competitive predictive performance among the four considered individual models. Combing all four models in a comprehensive BMA framework allows to further improve such predictive evaluation metrics as root mean squared error (RMSE) and mean absolute predictive error (MAPE). Nevertheless, DL with FNN remains the preferred method for predicting locations of influenza peaks. Conclusions The proposed approach can be viewed a feasible alternative to forecast ILI in Hong Kong or other countries where ILI has no constant seasonal trend and influenza data resources are limited. The
Forecasting influenza in Hong Kong with Google search queries and statistical model fusion.

PubMed

Xu, Qinneng; Gel, Yulia R; Ramirez Ramirez, L Leticia; Nezafati, Kusha; Zhang, Qingpeng; Tsui, Kwok-Leung

2017-01-01

The objective of this study is to investigate predictive utility of online social media and web search queries, particularly, Google search data, to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement (i.e., fickle media interest) and other artifacts of online social media data, in our approach we fuse multiple offline and online data sources. Four individual models: generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), autoregressive integrated moving average (ARIMA), and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. To our knowledge, this is the first study that introduces deep learning methodology into surveillance of infectious diseases and investigates its predictive utility. Furthermore, to exploit the strength from each individual forecasting models, we use statistical model fusion, using Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. For each model, an adaptive approach is used to capture the recent relationship between ILI and covariates. DL with FNN appears to deliver the most competitive predictive performance among the four considered individual models. Combing all four models in a comprehensive BMA framework allows to further improve such predictive evaluation metrics as root mean squared error (RMSE) and mean absolute predictive error (MAPE). Nevertheless, DL with FNN remains the preferred method for predicting locations of influenza peaks. The proposed approach can be viewed a feasible alternative to forecast ILI in Hong Kong or other countries where ILI has no constant seasonal trend and influenza data resources are limited. The proposed methodology is easily tractable
Evolution of Query Optimization Methods

NASA Astrophysics Data System (ADS)

Hameurlain, Abdelkader; Morvan, Franck

Query optimization is the most critical phase in query processing. In this paper, we try to describe synthetically the evolution of query optimization methods from uniprocessor relational database systems to data Grid systems through parallel, distributed and data integration systems. We point out a set of parameters to characterize and compare query optimization methods, mainly: (i) size of the search space, (ii) type of method (static or dynamic), (iii) modification types of execution plans (re-optimization or re-scheduling), (iv) level of modification (intra-operator and/or inter-operator), (v) type of event (estimation errors, delay, user preferences), and (vi) nature of decision-making (centralized or decentralized control).
Quantum Private Queries

NASA Astrophysics Data System (ADS)

Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo

2008-06-01

We propose a cheat sensitive quantum protocol to perform a private search on a classical database which is efficient in terms of communication complexity. It allows a user to retrieve an item from the database provider without revealing which item he or she retrieved: if the provider tries to obtain information on the query, the person querying the database can find it out. The protocol ensures also perfect data privacy of the database: the information that the user can retrieve in a single query is bounded and does not depend on the size of the database. With respect to the known (quantum and classical) strategies for private information retrieval, our protocol displays an exponential reduction in communication complexity and in running-time computational complexity.
Nowcasting Intraseasonal Recreational Fishing Harvest with Internet Search Volume

PubMed Central

Carter, David W.; Crosson, Scott; Liese, Christopher

2015-01-01

Estimates of recreational fishing harvest are often unavailable until after a fishing season has ended. This lag in information complicates efforts to stay within the quota. The simplest way to monitor quota within the season is to use harvest information from the previous year. This works well when fishery conditions are stable, but is inaccurate when fishery conditions are changing. We develop regression-based models to “nowcast” intraseasonal recreational fishing harvest in the presence of changing fishery conditions. Our basic model accounts for seasonality, changes in the fishing season, and important events in the fishery. Our extended model uses Google Trends data on the internet search volume relevant to the fishery of interest. We demonstrate the model with the Gulf of Mexico red snapper fishery where the recreational sector has exceeded the quota nearly every year since 2007. Our results confirm that data for the previous year works well to predict intraseasonal harvest for a year (2012) where fishery conditions are consistent with historic patterns. However, for a year (2013) of unprecedented harvest and management activity our regression model using search volume for the term “red snapper season” generates intraseasonal nowcasts that are 27% more accurate than the basic model without the internet search information and 29% more accurate than the prediction based on the previous year. Reliable nowcasts of intraseasonal harvest could make in-season (or in-year) management feasible and increase the likelihood of staying within quota. Our nowcasting approach using internet search volume might have the potential to improve quota management in other fisheries where conditions change year-to-year. PMID:26348645
Journal searching in non-MEDLINE resources on Internet Web sites.

PubMed

Lingle, V A

1997-01-01

Internet access to the medical journal literature is absorbing the attention of all relevant parties, i.e., publishers, journal vendors, librarians, commercial providers, government agencies, and end users. Journal content on the Web sites spans the range from advertising and ordering information for the print version, to table of contents and abstracts, to downloadable full text and graphics of articles. The searching parameters for systems other than MEDLINE also differ extensively with a wide variety of features and resulting retrieval. This discussion reviews a selection of providers of medical information (particularly the journal literature) on the Internet, making a comparison of what is available on Web sites and how it can be searched.
Towards Practical Privacy-Preserving Internet Services

ERIC Educational Resources Information Center

Wang, Shiyuan

2012-01-01

Today's Internet offers people a vast selection of data centric services, such as online query services, the cloud, and location-based services, etc. These internet services bring people a lot of convenience, but at the same time raise privacy concerns, e.g., sensitive information revealed by the queries, sensitive data being stored and…
Searching Internet Archive Sites with Archie: Why, What, Where, and How.

ERIC Educational Resources Information Center

Simmonds, Curtis

1993-01-01

Describes Archie, an online catalog of electronic holdings of anonymous FTP (File Transfer Protocol) archive sites on the Internet. Accessing Archie through e-mail and using it in a telnet session are discussed. The Internet Gopher and Whatis, which can be used with Archie, are also explained, and search examples are included. (four references)…
Whiplash Syndrome Reloaded: Digital Echoes of Whiplash Syndrome in the European Internet Search Engine Context

PubMed Central

2017-01-01

Background In many Western countries, after a motor vehicle collision, those involved seek health care for the assessment of injuries and for insurance documentation purposes. In contrast, in many less wealthy countries, there may be limited access to care and no insurance or compensation system. Objective The purpose of this infodemiology study was to investigate the global pattern of evolving Internet usage in countries with and without insurance and the corresponding compensation systems for whiplash injury. Methods We used the Internet search engine analytics via Google Trends to study the health information-seeking behavior concerning whiplash injury at national population levels in Europe. Results We found that the search for “whiplash” is strikingly and consistently often associated with the search for “compensation” in countries or cultures with a tort system. Frequent or traumatic painful injuries; diseases or disorders such as arthritis, headache, radius, and hip fracture; depressive disorders; and fibromyalgia were not associated similarly with searches on “compensation.” Conclusions In this study, we present evidence from the evolving viewpoint of naturalistic Internet search engine analytics that the expectations for receiving compensation may influence Internet search behavior in relation to whiplash injury. PMID:28347974
Query Transformations for Result Merging

DTIC Science & Technology

2014-11-01

tors, term dependence, query expansion 1. INTRODUCTION Federated search deals with the problem of aggregating results from multiple search engines . The...invidual search engines are (i) typically focused on a particular domain or a particular corpus, (ii) employ diverse retrieval models, and (iii...determine which search engines are appropri- ate for addressing the information need (resource selection), and (ii) merging the results returned by
Spatial information semantic query based on SPARQL

NASA Astrophysics Data System (ADS)

Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang

2009-10-01

How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.
Hybrid Filtering in Semantic Query Processing

ERIC Educational Resources Information Center

Jeong, Hanjo

2011-01-01

This dissertation presents a hybrid filtering method and a case-based reasoning framework for enhancing the effectiveness of Web search. Web search may not reflect user needs, intent, context, and preferences, because today's keyword-based search is lacking semantic information to capture the user's context and intent in posing the search query.…
Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China.

PubMed

Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao

2017-10-06

Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Ecological study. Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011-2014. Analyses were conducted at aggregate level and no confidential information was involved. A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. A high correlation between HFMD incidence and BDI ( r =0.794, p<0.001) or temperature ( r =0.657, p<0.001) was observed using both time series plot and correlation matrix. A linear effect of BDI (without lag) and non-linear effect of temperature (1 week lag) on HFMD incidence were found in a distributed lag non-linear model. Compared with the model based on surveillance data only, the ARIMAX model including BDI reached the best goodness-of-fit with an Akaike information criterion (AIC) value of -345.332, whereas the model including both BDI and temperature had the most accurate prediction in terms of the mean absolute percentage error (MAPE) of 101.745%. An ARIMAX model incorporating search engine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of other infectious diseases in other settings. © Article author(s) (or their
Assessing the Impact of the National Smoking Ban in Indoor Public Places in China: Evidence from Quit Smoking Related Online Searches

PubMed Central

Huang, Jidong; Zheng, Rong; Emery, Sherry

2013-01-01

Background Despite the tremendous economic and health costs imposed on China by tobacco use, China lacks a proactive and systematic tobacco control surveillance and evaluation system, hampering research progress on tobacco-focused surveillance and evaluation studies. Methods This paper uses online search query analyses to investigate changes in online search behavior among Chinese Internet users in response to the adoption of the national indoor public place smoking ban. Baidu Index and Google Trends were used to examine the volume of search queries containing three key search terms “Smoking Ban(s),” “Quit Smoking,” and “Electronic Cigarette(s),” along with the news coverage on the smoking ban, for the period 2009–2011. Findings Our results show that the announcement and adoption of the indoor public place smoking ban in China generated significant increases in news coverage on smoking bans. There was a strong positive correlation between the media coverage of smoking bans and the volume of “Smoking Ban(s)” and “Quit Smoking” related search queries. The volume of search queries related to “Electronic Cigarette(s)” was also correlated with the smoking ban news coverage. Interpretation To the extent it altered smoking-related online searches, our analyses suggest that the smoking ban had a significant effect, at least in the short run, on Chinese Internet users’ smoking-related behaviors. This research introduces a novel analytic tool, which could serve as an alternative tobacco control evaluation and behavior surveillance tool in the absence of timely or comprehensive population surveillance system. This research also highlights the importance of a comprehensive approach to tobacco control in China. PMID:23776504
Incremental Query Rewriting with Resolution

NASA Astrophysics Data System (ADS)

Riazanov, Alexandre; Aragão, Marcelo A. T.

We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.
Improving biomedical information retrieval by linear combinations of different query expansion techniques.

PubMed

Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar

2016-07-25

Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.
Internet Search Patterns of Human Immunodeficiency Virus and the Digital Divide in the Russian Federation: Infoveillance Study

PubMed Central

Quinn, Casey; Hercz, Daniel; Gillespie, James A

2013-01-01

Background Human immunodeficiency virus (HIV) is a serious health problem in the Russian Federation. However, the true scale of HIV in Russia has long been the subject of considerable debate. Using digital surveillance to monitor diseases has become increasingly popular in high income countries. But Internet users may not be representative of overall populations, and the characteristics of the Internet-using population cannot be directly ascertained from search pattern data. This exploratory infoveillance study examined if Internet search patterns can be used for disease surveillance in a large middle-income country with a dispersed population. Objective This study had two main objectives: (1) to validate Internet search patterns against national HIV prevalence data, and (2) to investigate the relationship between search patterns and the determinants of Internet access. Methods We first assessed whether online surveillance is a valid and reliable method for monitoring HIV in the Russian Federation. Yandex and Google both provided tools to study search patterns in the Russian Federation. We evaluated the relationship between both Yandex and Google aggregated search patterns and HIV prevalence in 2011 at national and regional tiers. Second, we analyzed the determinants of Internet access to determine the extent to which they explained regional variations in searches for the Russian terms for “HIV” and “AIDS”. We sought to extend understanding of the characteristics of Internet searching populations by data matching the determinants of Internet access (age, education, income, broadband access price, and urbanization ratios) and searches for the term “HIV” using principal component analysis (PCA). Results We found generally strong correlations between HIV prevalence and searches for the terms “HIV” and “AIDS”. National correlations for Yandex searches for “HIV” were very strongly correlated with HIV prevalence (Spearman rank-order coefficient
Answers to Health Questions: Internet Search Results Versus Online Health Community Responses

PubMed Central

Vermeesch, Amber; Given, Barbara; Huh, Jina

2016-01-01

Background About 6 million people search for health information on the Internet each day in the United States. Both patients and caregivers search for information about prescribed courses of treatments, unanswered questions after a visit to their providers, or diet and exercise regimens. Past literature has indicated potential challenges around quality in health information available on the Internet. However, diverse information exists on the Internet—ranging from government-initiated webpages to personal blog pages. Yet we do not fully understand the strengths and weaknesses of different types of information available on the Internet. Objective The objective of this research was to investigate the strengths and challenges of various types of health information available online and to suggest what information sources best fit various question types. Methods We collected questions posted to and the responses they received from an online diabetes community and classified them according to Rothwell’s classification of question types (fact, policy, or value questions). We selected 60 questions (20 each of fact, policy, and value) and the replies the questions received from the community. We then searched for responses to the same questions using a search engine and recorded the Results Community responses answered more questions than did search results overall. Search results were most effective in answering value questions and least effective in answering policy questions. Community responses answered questions across question types at an equivalent rate, but most answered policy questions and the least answered fact questions. Value questions were most answered by community responses, but some of these answers provided by the community were incorrect. Fact question search results were the most clinically valid. Conclusions The Internet is a prevalent source of health information for people. The information quality people encounter online can have a large impact

Applicability of internet search index for asthma admission forecast using machine learning.

PubMed

Luo, Li; Liao, Chengcheng; Zhang, Fengyi; Zhang, Wei; Li, Chunyang; Qiu, Zhixin; Huang, Debin

2018-04-15

This study aimed to determine whether a search index could provide insight into trends in asthma admission in China. An Internet search index is a powerful tool to monitor and predict epidemic outbreaks. However, whether using an internet search index can significantly improve asthma admissions forecasts remains unknown. The long-term goal is to develop a surveillance system to help early detection and interventions for asthma and to avoid asthma health care resource shortages in advance. In this study, we used a search index combined with air pollution data, weather data, and historical admissions data to forecast asthma admissions using machine learning. Results demonstrated that the best area under the curve in the test set that can be achieved is 0.832, using all predictors mentioned earlier. A search index is a powerful predictor in asthma admissions forecast, and a recent search index can reflect current asthma admissions with a lag-effect to a certain extent. The addition of a real-time, easily accessible search index improves forecasting capabilities and demonstrates the predictive potential of search index. Copyright © 2018 John Wiley & Sons, Ltd.
Adolescents Searching for Health Information on the Internet: An Observational Study

PubMed Central

Derry, Holly A; Resnick, Paul J; Richardson, Caroline R

2003-01-01

Background Adolescents' access to health information on the Internet is partly a function of their ability to search for and find answers to their health-related questions. Adolescents may have unique health and computer literacy needs. Although many surveys, interviews, and focus groups have been utilized to understand the information-seeking and information-retrieval behavior of adolescents looking for health information online, we were unable to locate observations of individual adolescents that have been conducted in this context. Objective This study was designed to understand how adolescents search for health information using the Internet and what implications this may have on access to health information. Methods A convenience sample of 12 students (age 12-17 years) from 1 middle school and 2 high schools in southeast Michigan were provided with 6 health-related questions and asked to look for answers using the Internet. Researchers recorded 68 specific searches using software that captured screen images as well as synchronized audio recordings. Recordings were reviewed later and specific search techniques and strategies were coded. A qualitative review of the verbal communication was also performed. Results Out of 68 observed searches, 47 (69%) were successful in that the adolescent found a correct and useful answer to the health question. The majority of sites that students attempted to access were retrieved directly from search engine results (77%) or a search engine's recommended links (10%); only a small percentage were directly accessed (5%) or linked from another site (7%). The majority (83%) of followed links from search engine results came from the first 9 results. Incorrect spelling (30 of 132 search terms), number of pages visited within a site (ranging from 1-15), and overall search strategy (eg, using a search engine versus directly accessing a site), were each important determinants of success. Qualitative analysis revealed that participants
Searching and Researching on the Internet and the World Wide Web.

ERIC Educational Resources Information Center

Ackerman, Ernest; Hartman, Karen

This book focuses on formulating Internet search strategies, understanding how to form search expressions, critically evaluating information, and citing resources. It is written for users who are acquainted with the fundamental operations of a personal computer, as well as those with more online experience. The book is arranged so that the…
Whiplash Syndrome Reloaded: Digital Echoes of Whiplash Syndrome in the European Internet Search Engine Context.

PubMed

Noll-Hussong, Michael

2017-03-27

In many Western countries, after a motor vehicle collision, those involved seek health care for the assessment of injuries and for insurance documentation purposes. In contrast, in many less wealthy countries, there may be limited access to care and no insurance or compensation system. The purpose of this infodemiology study was to investigate the global pattern of evolving Internet usage in countries with and without insurance and the corresponding compensation systems for whiplash injury. We used the Internet search engine analytics via Google Trends to study the health information-seeking behavior concerning whiplash injury at national population levels in Europe. We found that the search for "whiplash" is strikingly and consistently often associated with the search for "compensation" in countries or cultures with a tort system. Frequent or traumatic painful injuries; diseases or disorders such as arthritis, headache, radius, and hip fracture; depressive disorders; and fibromyalgia were not associated similarly with searches on "compensation." In this study, we present evidence from the evolving viewpoint of naturalistic Internet search engine analytics that the expectations for receiving compensation may influence Internet search behavior in relation to whiplash injury. ©Michael Noll-Hussong. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 27.03.2017.
Neurology and the Internet: a review.

PubMed

Moccia, Marcello; Brigo, Francesco; Tedeschi, Gioacchino; Bonavita, Simona; Lavorgna, Luigi

2018-06-01

Nowadays, the Internet is the major source to obtain information about diseases and their treatments. The Internet is gaining relevance in the neurological setting, considering the possibility of timely social interaction, contributing to general public awareness on otherwise less-well-known neurological conditions, promoting health equity and improving the health-related coping. Neurological patients can easily find several online opportunities for peer interactions and learning. On the other hand, neurologist can analyze user-generated data to better understand patient needs and to run epidemiological studies. Indeed, analyses of queries from Internet search engines on certain neurological diseases have shown a strict temporal and spatial correlation with the "real world." In this narrative review, we will discuss how the Internet is radically affecting the healthcare of people with neurological disorders and, most importantly, is shifting the paradigm of care from the hands of those who deliver care, into the hands of those who receive it. Besides, we will review possible limitations, such as safety concerns, financial issues, and the need for easy-to-access platforms.
Internet Searches and Their Relationship to Cognitive Function in Older Adults: Cross-Sectional Analysis.

PubMed

Austin, Johanna; Hollingshead, Kristy; Kaye, Jeffrey

2017-09-06

Alzheimer disease (AD) is a very challenging experience for all those affected. Unfortunately, detection of Alzheimer disease in its early stages when clinical treatments may be most effective is challenging, as the clinical evaluations are time-consuming and costly. Recent studies have demonstrated a close relationship between cognitive function and everyday behavior, an avenue of research that holds great promise for the early detection of cognitive decline. One area of behavior that changes with cognitive decline is language use. Multiple groups have demonstrated a close relationship between cognitive function and vocabulary size, verbal fluency, and semantic ability, using conventional in-person cognitive testing. An alternative to this approach which is inherently ecologically valid may be to take advantage of automated computer monitoring software to continually capture and analyze language use while on the computer. The aim of this study was to understand the relationship between Internet searches as a measure of language and cognitive function in older adults. We hypothesize that individuals with poorer cognitive function will search using fewer unique terms, employ shorter words, and use less obscure words in their searches. Computer monitoring software (WorkTime, Nestersoft Inc) was used to continuously track the terms people entered while conducting searches in Google, Yahoo, Bing, and Ask.com. For all searches, punctuation, accents, and non-ASCII characters were removed, and the resulting search terms were spell-checked before any analysis. Cognitive function was evaluated as a z-normalized summary score capturing five unique cognitive domains. Linear regression was used to determine the relationship between cognitive function and Internet searches by controlling for variables such as age, sex, and education. Over a 6-month monitoring period, 42 participants (mean age 81 years [SD 10.5], 83% [35/42] female) conducted 2915 searches using these top search
How does searching for health information on the Internet affect individuals' demand for health care services?

PubMed

Suziedelyte, Agne

2012-11-01

The emergence of the Internet made health information, which previously was almost exclusively available to health professionals, accessible to the general public. Access to health information on the Internet is likely to affect individuals' health care related decisions. The aim of this analysis is to determine how health information that people obtain from the Internet affects their demand for health care. I use a novel data set, the U.S. Health Information National Trends Survey (2003-07), to answer this question. The causal variable of interest is a binary variable that indicates whether or not an individual has recently searched for health information on the Internet. Health care utilization is measured by an individual's number of visits to a health professional in the past 12 months. An individual's decision to use the Internet to search for health information is likely to be correlated to other variables that can also affect his/her demand for health care. To separate the effect of Internet health information from other confounding variables, I control for a number of individual characteristics and use the instrumental variable estimation method. As an instrument for Internet health information, I use U.S. state telecommunication regulations that are shown to affect the supply of Internet services. I find that searching for health information on the Internet has a positive, relatively large, and statistically significant effect on an individual's demand for health care. This effect is larger for the individuals who search for health information online more frequently and people who have health care coverage. Among cancer patients, the effect of Internet health information seeking on health professional visits varies by how long ago they were diagnosed with cancer. Thus, the Internet is found to be a complement to formal health care rather than a substitute for health professional services. Copyright © 2012 Elsevier Ltd. All rights reserved.
The accuracy of Internet search engines to predict diagnoses from symptoms can be assessed with a validated scoring system.

PubMed

Shenker, Bennett S

2014-02-01

To validate a scoring system that evaluates the ability of Internet search engines to correctly predict diagnoses when symptoms are used as search terms. We developed a five point scoring system to evaluate the diagnostic accuracy of Internet search engines. We identified twenty diagnoses common to a primary care setting to validate the scoring system. One investigator entered the symptoms for each diagnosis into three Internet search engines (Google, Bing, and Ask) and saved the first five webpages from each search. Other investigators reviewed the webpages and assigned a diagnostic accuracy score. They rescored a random sample of webpages two weeks later. To validate the five point scoring system, we calculated convergent validity and test-retest reliability using Kendall's W and Spearman's rho, respectively. We used the Kruskal-Wallis test to look for differences in accuracy scores for the three Internet search engines. A total of 600 webpages were reviewed. Kendall's W for the raters was 0.71 (p<0.0001). Spearman's rho for test-retest reliability was 0.72 (p<0.0001). There was no difference in scores based on Internet search engine. We found a significant difference in scores based on the webpage's order on the Internet search engine webpage (p=0.007). Pairwise comparisons revealed higher scores in the first webpages vs. the fourth (corr p=0.009) and fifth (corr p=0.017). However, this significance was lost when creating composite scores. The five point scoring system to assess diagnostic accuracy of Internet search engines is a valid and reliable instrument. The scoring system may be used in future Internet research. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Scoping review on search queries and social media for disease surveillance: a chronology of innovation.

PubMed

Bernardo, Theresa Marie; Rajic, Andrijana; Young, Ian; Robiadek, Katie; Pham, Mai T; Funk, Julie A

2013-07-18

The threat of a global pandemic posed by outbreaks of influenza H5N1 (1997) and Severe Acute Respiratory Syndrome (SARS, 2002), both diseases of zoonotic origin, provoked interest in improving early warning systems and reinforced the need for combining data from different sources. It led to the use of search query data from search engines such as Google and Yahoo! as an indicator of when and where influenza was occurring. This methodology has subsequently been extended to other diseases and has led to experimentation with new types of social media for disease surveillance. The objective of this scoping review was to formally assess the current state of knowledge regarding the use of search queries and social media for disease surveillance in order to inform future work on early detection and more effective mitigation of the effects of foodborne illness. Structured scoping review methods were used to identify, characterize, and evaluate all published primary research, expert review, and commentary articles regarding the use of social media in surveillance of infectious diseases from 2002-2011. Thirty-two primary research articles and 19 reviews and case studies were identified as relevant. Most relevant citations were peer-reviewed journal articles (29/32, 91%) published in 2010-11 (28/32, 88%) and reported use of a Google program for surveillance of influenza. Only four primary research articles investigated social media in the context of foodborne disease or gastroenteritis. Most authors (21/32 articles, 66%) reported that social media-based surveillance had comparable performance when compared to an existing surveillance program. The most commonly reported strengths of social media surveillance programs included their effectiveness (21/32, 66%) and rapid detection of disease (21/32, 66%). The most commonly reported weaknesses were the potential for false positive (16/32, 50%) and false negative (11/32, 34%) results. Most authors (24/32, 75%) recommended that
Scoping Review on Search Queries and Social Media for Disease Surveillance: A Chronology of Innovation

PubMed Central

Rajic, Andrijana; Young, Ian; Robiadek, Katie; Pham, Mai T; Funk, Julie A

2013-01-01

Background The threat of a global pandemic posed by outbreaks of influenza H5N1 (1997) and Severe Acute Respiratory Syndrome (SARS, 2002), both diseases of zoonotic origin, provoked interest in improving early warning systems and reinforced the need for combining data from different sources. It led to the use of search query data from search engines such as Google and Yahoo! as an indicator of when and where influenza was occurring. This methodology has subsequently been extended to other diseases and has led to experimentation with new types of social media for disease surveillance. Objective The objective of this scoping review was to formally assess the current state of knowledge regarding the use of search queries and social media for disease surveillance in order to inform future work on early detection and more effective mitigation of the effects of foodborne illness. Methods Structured scoping review methods were used to identify, characterize, and evaluate all published primary research, expert review, and commentary articles regarding the use of social media in surveillance of infectious diseases from 2002-2011. Results Thirty-two primary research articles and 19 reviews and case studies were identified as relevant. Most relevant citations were peer-reviewed journal articles (29/32, 91%) published in 2010-11 (28/32, 88%) and reported use of a Google program for surveillance of influenza. Only four primary research articles investigated social media in the context of foodborne disease or gastroenteritis. Most authors (21/32 articles, 66%) reported that social media-based surveillance had comparable performance when compared to an existing surveillance program. The most commonly reported strengths of social media surveillance programs included their effectiveness (21/32, 66%) and rapid detection of disease (21/32, 66%). The most commonly reported weaknesses were the potential for false positive (16/32, 50%) and false negative (11/32, 34%) results. Most
Effect of Reading Ability and Internet Experience on Keyword-Based Image Search

ERIC Educational Resources Information Center

Lei, Pei-Lan; Lin, Sunny S. J.; Sun, Chuen-Tsai

2013-01-01

Image searches are now crucial for obtaining information, constructing knowledge, and building successful educational outcomes. We investigated how reading ability and Internet experience influence keyword-based image search behaviors and performance. We categorized 58 junior-high-school students into four groups of high/low reading ability and…
A study of Internet searches for medical information in dermatology patients: The patient-physician relationship.

PubMed

Orgaz-Molina, J; Cotugno, M; Girón-Prieto, M S; Arrabal-Polo, M A; Ruiz-Carrascosa, J C; Buendía-Eisman, A; Arias-Santiago, S

2015-01-01

The use of the Internet to search for medical information is considered by some physicians as an invasion of their medical domain and a reflection of a lack of trust in their advice and recommendations. The main objective of this study was to estimate the amount of medical information gathered from the Internet and to establish whether these online searches reflect a lower degree of patient satisfaction. A survey was conducted among 175 patients seen at the melanoma and psoriasis units of San Cecilio University Hospital in Granada, Spain between May 2010 and December 2011. Online searches for medical information were performed by 44.4% of patients who returned correctly completed questionnaires. The main reasons given for these searches were to complement appropriate information provided by the physician (67.3%) and to gather information before consultation with the physician (36.5%). Variables associated with the search for medical information on the Internet in the multivariate analysis were a higher educational level, a higher score on two items in the Need for Cognition Scale, and consultation of mass media other than the Internet. Studies with larger numbers of patients and other diseases, however, are required to confirm these results. The search for medical information is a widespread reality among patients with psoriasis and melanoma and it is not associated with a poor relationship with the physician. Dermatologists can play a beneficial role by recommending trustworthy Internet sites during the patient's visit and by promoting the development of pages by scientific societies to provide high-quality information. Copyright © 2014 Elsevier España, S.L.U. and AEDV. All rights reserved.
Automatic Query Formulations in Information Retrieval.

ERIC Educational Resources Information Center

Salton, G.; And Others

1983-01-01

Introduces methods designed to reduce role of search intermediaries by generating Boolean search formulations automatically using term frequency considerations from natural language statements provided by system patrons. Experimental results are supplied and methods are described for applying automatic query formulation process in practice.…
Mining the SDSS SkyServer SQL queries log

NASA Astrophysics Data System (ADS)

Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani

2016-05-01

SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.
Assisting Consumer Health Information Retrieval with Query Recommendations

PubMed Central

Zeng, Qing T.; Crowell, Jonathan; Plovnick, Robert M.; Kim, Eunjung; Ngo, Long; Dibble, Emily

2006-01-01

Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. Design: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. Measurements: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. Results: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16–2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. Conclusion: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR. PMID:16221944
Considerations in the Choice of an Internet Search Tool.

ERIC Educational Resources Information Center

Vaughan, Jason

1999-01-01

Describes a survey conducted among library school graduate students and librarians at the University of North Carolina at Chapel Hill that investigated factors that play a role in information professionals' choice of Internet search tools. Utility functions and ease of use are discussed and the original online survey is appended. (Author/LRW)
Internet search term affects the quality and accuracy of online information about developmental hip dysplasia.

PubMed

Fabricant, Peter D; Dy, Christopher J; Patel, Ronak M; Blanco, John S; Doyle, Shevaun M

2013-06-01

The recent emphasis on shared decision-making has increased the role of the Internet as a readily accessible medical reference source for patients and families. However, the lack of professional review creates concern over the quality, accuracy, and readability of medical information available to patients on the Internet. Three Internet search engines (Google, Yahoo, and Bing) were evaluated prospectively using 3 difference search terms of varying sophistication ("congenital hip dislocation," "developmental dysplasia of the hip," and "hip dysplasia in children"). Sixty-three unique Web sites were evaluated by each of 3 surgeons (2 fellowship-trained pediatric orthopaedic attendings and 1 orthopaedic chief resident) for quality and accuracy using a set of scoring criteria based on the AAOS/POSNA patient education Web site. The readability (literacy grade level) of each Web site was assessed using the Fleisch-Kincaid score. There were significant differences noted in quality, accuracy, and readability of information depending on the search term used. The search term "developmental dysplasia of the hip" provided higher quality and accuracy compared with the search term "congenital hip dislocation." Of the 63 total Web sites, 1 (1.6%) was below the sixth grade reading level recommended by the NIH for health education materials and 8 (12.7%) Web sites were below the average American reading level (eighth grade). The quality and accuracy of information available on the Internet regarding developmental hip dysplasia significantly varied with the search term used. Patients seeking information about DDH on the Internet may not understand the materials found because nearly all of the Web sites are written at a level above that recommended for publically distributed health information. Physicians should advise their patients to search for information using the term "developmental dysplasia of the hip" or, better yet, should refer patients to Web sites that they have
[Biomedical information on the internet using search engines. A one-year trial].

PubMed

Corrao, Salvatore; Leone, Francesco; Arnone, Sabrina

2004-01-01

The internet is a communication medium and content distributor that provide information in the general sense but it could be of great utility regarding as the search and retrieval of biomedical information. Search engines represent a great deal to rapidly find information on the net. However, we do not know whether general search engines and meta-search ones are reliable in order to find useful and validated biomedical information. The aim of our study was to verify the reproducibility of a search by key-words (pediatric or evidence) using 9 international search engines and 1 meta-search engine at the baseline and after a one year period. We analysed the first 20 citations as output of each searching. We evaluated the formal quality of Web-sites and their domain extensions. Moreover, we compared the output of each search at the start of this study and after a one year period and we considered as a criterion of reliability the number of Web-sites cited again. We found some interesting results that are reported throughout the text. Our findings point out an extreme dynamicity of the information on the Web and, for this reason, we advice a great caution when someone want to use search and meta-search engines as a tool for searching and retrieve reliable biomedical information. On the other hand, some search and meta-search engines could be very useful as a first step searching for defining better a search and, moreover, for finding institutional Web-sites too. This paper allows to know a more conscious approach to the internet biomedical information universe.
Reconsidering the Rhizome: A Textual Analysis of Web Search Engines as Gatekeepers of the Internet

NASA Astrophysics Data System (ADS)

Hess, A.

Critical theorists have often drawn from Deleuze and Guattari's notion of the rhizome when discussing the potential of the Internet. While the Internet may structurally appear as a rhizome, its day-to-day usage by millions via search engines precludes experiencing the random interconnectedness and potential democratizing function. Through a textual analysis of four search engines, I argue that Web searching has grown hierarchies, or "trees," that organize data in tracts of knowledge and place users in marketing niches rather than assist in the development of new knowledge.
Seasons, Searches, and Intentions: What The Internet Can Tell Us About The Bed Bug (Hemiptera: Cimicidae) Epidemic.

PubMed

Sentana-Lledo, Daniel; Barbu, Corentin M; Ngo, Michelle N; Wu, Yage; Sethuraman, Karthik; Levy, Michael Z

2016-01-01

The common bed bug (Cimex lectularius L.) is once again prevalent in the United States. We investigated temporal patterns in Google search queries for bed bugs and co-occurring terms, and conducted in-person surveys to explore the intentions behind searches that included those terms. Searches for "bed bugs" rose steadily through 2011 and then plateaued, suggesting that the epidemic has reached an equilibrium in the United States. However, queries including terms that survey respondents associated strongly with having bed bugs (e.g., "exterminator," "remedies") continued to climb, while terms more closely associated with informational searches (e.g., "hotels," "about") fell. Respondents' rankings of terms and nonseasonal trends in Google search volume as assessed by a cosinor model were significantly correlated (Kendall's Tau-b P = 0.015). We find no evidence from Google Trends that the bed bug epidemic in the United States has reached equilibrium. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Chemical-text hybrid search engines.

PubMed

Zhou, Yingyao; Zhou, Bin; Jiang, Shumei; King, Frederick J

2010-01-01

As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from the inclusion of "false negative" and "false positive" results, and cannot accommodate diverse repertoire of formats currently available for chemical documents. To address these concerns, we developed an approach called Entity-Canonical Keyword Indexing (ECKI), which converts a chemical entity embedded in a data source into its canonical keyword representation prior to being indexed by text search engines. We implemented ECKI using Microsoft Office SharePoint Server Search, and the resultant hybrid search engine not only supported complex mixed chemical and keyword queries but also was applied to both intranet and Internet environments. We envision that the adoption of ECKI will empower researchers to pose more complex search questions that were not readily attainable previously and to obtain answers at much improved speed and accuracy.
Internet search trends analysis tools can provide real-time data on kidney stone disease in the United States.

PubMed

Willard, Scott D; Nguyen, Mike M

2013-01-01

To evaluate the utility of using Internet search trends data to estimate kidney stone occurrence and understand the priorities of patients with kidney stones. Internet search trends data represent a unique resource for monitoring population self-reported illness and health information-seeking behavior. The Google Insights for Search analysis tool was used to study searches related to kidney stones, with each search term returning a search volume index (SVI) according to the search frequency relative to the total search volume. SVIs for the term, "kidney stones," were compiled by location and time parameters and compared with the published weather and stone prevalence data. Linear regression analysis was performed to determine the association of the search interest score with known epidemiologic variations in kidney stone disease, including latitude, temperature, season, and state. The frequency of the related search terms was categorized by theme and qualitatively analyzed. The SVI correlated significantly with established kidney stone epidemiologic predictors. The SVI correlated with the state latitude (R-squared=0.25; P<.001), the state mean annual temperature (R-squared=0.24; P<.001), and state combined sex prevalence (R-squared=0.25; P<.001). Female prevalence correlated more strongly than did male prevalence (R-squared=0.37; P<.001, and R-squared=0.17; P=.003, respectively). The national SVI correlated strongly with the average U.S. temperature by month (R-squared=0.54; P=.007). The search term ranking suggested that Internet users are most interested in the diagnosis, followed by etiology, infections, and treatment. Geographic and temporal variability in kidney stone disease appear to be accurately reflected in Internet search trends data. Internet search trends data might have broader applications for epidemiologic and urologic research. Copyright © 2013 Elsevier Inc. All rights reserved.
Identifying Complementary and Alternative Medicine Usage Information from Internet Resources. A Systematic Review.

PubMed

Sharma, Vivekanand; Holmes, John H; Sarkar, Indra N

2016-08-05

Identify and highlight research issues and methods used in studying Complementary and Alternative Medicine (CAM) information needs, access, and exchange over the Internet. A literature search was conducted using Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines from PubMed to identify articles that have studied Internet use in the CAM context. Additional searches were conducted at Nature.com and Google Scholar. The Internet provides a major medium for attaining CAM information and can also serve as an avenue for conducting CAM related surveys. Based on the literature analyzed in this review, there seems to be significant interest in developing methodologies for identifying CAM treatments, including the analysis of search query data and social media platform discussions. Several studies have also underscored the challenges in developing approaches for identifying the reliability of CAM-related information on the Internet, which may not be supported with reliable sources. The overall findings of this review suggest that there are opportunities for developing approaches for making available accurate information and developing ways to restrict the spread and sale of potentially harmful CAM products and information. Advances in Internet research are yet to be used in context of understanding CAM prevalence and perspectives. Such approaches may provide valuable insights into the current trends and needs in context of CAM use and spread.
IDENTIFYING COMPLEMENTARY AND ALTERNATIVE MEDICINE USAGE INFORMATION FROM INTERNET RESOURCES: A SYSTEMATIC REVIEW

PubMed Central

Sharma, V.; Holmes, J.H.; Sarkar, I.N.

2016-01-01

SUMMARY Objective Identify and highlight research issues and methods used in studying Complementary and Alternative Medicine (CAM) information needs, access, and exchange over the Internet. Methods A literature search was conducted using Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines from PubMed to identify articles that have studied Internet use in the CAM context. Additional searches were conducted at Nature.com and Google Scholar. Results The Internet provides a major medium for attaining CAM information and can also serve as an avenue for conducting CAM related surveys. Based on the literature analyzed in this review, there seems to be significant interest in developing methodologies for identifying CAM treatments, including the analysis of search query data and social media platform discussions. Several studies have also underscored the challenges in developing approaches for identifying the reliability of CAM-related information on the Internet, which may not be supported with reliable sources. The overall findings of this review suggest that there are opportunities for developing approaches for making available accurate information and developing ways to restrict the spread and sale of potentially harmful CAM products and information. Conclusions Advances in Internet research are yet to be used in context of understanding CAM prevalence and perspectives. Such approaches may provide valuable insights into the current trends and needs in context of CAM use and spread. PMID:27352304
Analysis of PubMed User Sessions Using a Full-Day PubMed Query Log: A Comparison of Experienced and Nonexperienced PubMed Users

PubMed Central

2015-01-01

Background PubMed is the largest biomedical bibliographic information source on the Internet. PubMed has been considered one of the most important and reliable sources of up-to-date health care evidence. Previous studies examined the effects of domain expertise/knowledge on search performance using PubMed. However, very little is known about PubMed users’ knowledge of information retrieval (IR) functions and their usage in query formulation. Objective The purpose of this study was to shed light on how experienced/nonexperienced PubMed users perform their search queries by analyzing a full-day query log. Our hypotheses were that (1) experienced PubMed users who use system functions quickly retrieve relevant documents and (2) nonexperienced PubMed users who do not use them have longer search sessions than experienced users. Methods To test these hypotheses, we analyzed PubMed query log data containing nearly 3 million queries. User sessions were divided into two categories: experienced and nonexperienced. We compared experienced and nonexperienced users per number of sessions, and experienced and nonexperienced user sessions per session length, with a focus on how fast they completed their sessions. Results To test our hypotheses, we measured how successful information retrieval was (at retrieving relevant documents), represented as the decrease rates of experienced and nonexperienced users from a session length of 1 to 2, 3, 4, and 5. The decrease rate (from a session length of 1 to 2) of the experienced users was significantly larger than that of the nonexperienced groups. Conclusions Experienced PubMed users retrieve relevant documents more quickly than nonexperienced PubMed users in terms of session length. PMID:26139516
Internet Use for Searching Information on Medicines and Disease: A Community Pharmacy-Based Survey Among Adult Pharmacy Customers.

PubMed

Lombardo, Simona; Cosentino, Marco

2016-07-13

The Internet is increasingly used as a source of health-related information, and a vast majority of Internet users are performing health-related searches in the United States and Europe, with wide differences among countries. Health information searching behavior on the Internet is affected by multiple factors, including demographics, socioeconomic factors, education, employment, attitudes toward the Internet, and health conditions, and their knowledge may help to promote a safer use of the Internet. Limited information however exists so far about Internet use to search for medical information in Italy. The objective of this study was to investigate the use of the Internet for searching for information on medicines and disease in adult subjects in Northern Italy. Survey in randomly selected community pharmacies, using a self-administered questionnaire, with open and multiple choices questions, was conducted. A total of 1008 participants were enrolled (59.5% women; median age: 43 years; range: 14-88 years). Previous use of the Internet to search for information about medicines or dietary supplements was reported by 26.0% of respondents, more commonly by women (30.00% vs 20.10% men, P<.001), unmarried subjects (32.9% vs 17.4% widowed subjects, P=.022), and employed people (29.1% vs 10.4% retired people, P=.002). Use was highest in the age range of 26 to 35 (40.0% users vs 19.6% and 12.3% in the age range ≤25 and ≥56, respectively, P<.001) and increased with years of education (from 5.3% with 5 years, up to 41.0% with a university degree, P<.001). Previous use of the Internet to search for information about disease was reported by 59.1% of respondents, more commonly by women (64.5% vs 51.0% males, P<.001), unmarried subjects (64.2% vs 58.5% married or divorced subjects and 30.4% widowed subjects, P=.012), unemployed people (66.7% vs 64.0% workers and 29.9% retired people, P<.001). Use was highest in the age range of 26 to 35 (70.1% vs 64.4% in both 36-45 and 46
GeneBee-net: Internet-based server for analyzing biopolymers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brodsky, L.I.; Ivanov, V.V.; Nikolaev, V.K.

This work describes a network server for searching databanks of biopolymer structures and performing other biocomputing procedures; it is available via direct Internet connection. Basic server procedures are dedicated to homology (similarity) search of sequence and 3D structure of proteins. The homologies found could be used to build multiple alignments, predict protein and RNA secondary structure, and construct phylogenetic trees. In addition to traditional methods of sequence similarity search, the authors propose {open_quotes}non-matrix{close_quotes} (correlational) search. An analogous approach is used to identify regions of similar tertiary structure of proteins. Algorithm concepts and usage examples are presented for new methods. Servicemore » logic is based upon interaction of a client program and server procedures. The client program allows the compilation of queries and the processing of results of an analysis.« less
Improving accuracy for identifying related PubMed queries by an integrated approach.

PubMed

Lu, Zhiyong; Wilbur, W John

2009-10-01

PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.
A high performance, ad-hoc, fuzzy query processing system for relational databases

NASA Technical Reports Server (NTRS)

Mansfield, William H., Jr.; Fleischman, Robert M.

1992-01-01

Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.
Utilization of a radiology-centric search engine.

PubMed

Sharpe, Richard E; Sharpe, Megan; Siegel, Eliot; Siddiqui, Khan

2010-04-01

Internet-based search engines have become a significant component of medical practice. Physicians increasingly rely on information available from search engines as a means to improve patient care, provide better education, and enhance research. Specialized search engines have emerged to more efficiently meet the needs of physicians. Details about the ways in which radiologists utilize search engines have not been documented. The authors categorized every 25th search query in a radiology-centric vertical search engine by radiologic subspecialty, imaging modality, geographic location of access, time of day, use of abbreviations, misspellings, and search language. Musculoskeletal and neurologic imagings were the most frequently searched subspecialties. The least frequently searched were breast imaging, pediatric imaging, and nuclear medicine. Magnetic resonance imaging and computed tomography were the most frequently searched modalities. A majority of searches were initiated in North America, but all continents were represented. Searches occurred 24 h/day in converted local times, with a majority occurring during the normal business day. Misspellings and abbreviations were common. Almost all searches were performed in English. Search engine utilization trends are likely to mirror trends in diagnostic imaging in the region from which searches originate. Internet searching appears to function as a real-time clinical decision-making tool, a research tool, and an educational resource. A more thorough understanding of search utilization patterns can be obtained by analyzing phrases as actually entered as well as the geographic location and time of origination. This knowledge may contribute to the development of more efficient and personalized search engines.
Searching for pain information, education, and support on the Internet.

PubMed

Colón, Yvette

2013-03-01

Questions from patients about pain conditions and analgesic pharmacotherapy and responses from authors are presented to help educate patients and make them more effective self-advocates. The topics addressed in this issue are searching for pain information, education, support, and providers on the Internet and evaluating online information.
Which factors predict the time spent answering queries to a drug information centre?

PubMed Central

Reppe, Linda A.; Spigset, Olav

2010-01-01

Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480
Internet Searches and Their Relationship to Cognitive Function in Older Adults: Cross-Sectional Analysis

PubMed Central

Hollingshead, Kristy; Kaye, Jeffrey

2017-01-01

Background Alzheimer disease (AD) is a very challenging experience for all those affected. Unfortunately, detection of Alzheimer disease in its early stages when clinical treatments may be most effective is challenging, as the clinical evaluations are time-consuming and costly. Recent studies have demonstrated a close relationship between cognitive function and everyday behavior, an avenue of research that holds great promise for the early detection of cognitive decline. One area of behavior that changes with cognitive decline is language use. Multiple groups have demonstrated a close relationship between cognitive function and vocabulary size, verbal fluency, and semantic ability, using conventional in-person cognitive testing. An alternative to this approach which is inherently ecologically valid may be to take advantage of automated computer monitoring software to continually capture and analyze language use while on the computer. Objective The aim of this study was to understand the relationship between Internet searches as a measure of language and cognitive function in older adults. We hypothesize that individuals with poorer cognitive function will search using fewer unique terms, employ shorter words, and use less obscure words in their searches. Methods Computer monitoring software (WorkTime, Nestersoft Inc) was used to continuously track the terms people entered while conducting searches in Google, Yahoo, Bing, and Ask.com. For all searches, punctuation, accents, and non-ASCII characters were removed, and the resulting search terms were spell-checked before any analysis. Cognitive function was evaluated as a z-normalized summary score capturing five unique cognitive domains. Linear regression was used to determine the relationship between cognitive function and Internet searches by controlling for variables such as age, sex, and education. Results Over a 6-month monitoring period, 42 participants (mean age 81 years [SD 10.5], 83% [35/42] female) conducted
Searching for Suicide Methods: Accessibility of Information About Helium as a Method of Suicide on the Internet.

PubMed

Gunnell, David; Derges, Jane; Chang, Shu-Sen; Biddle, Lucy

2015-01-01

Helium gas suicides have increased in England and Wales; easy-to-access descriptions of this method on the Internet may have contributed to this rise. To investigate the availability of information on using helium as a method of suicide and trends in searching about this method on the Internet. We analyzed trends in (a) Google searching (2004-2014) and (b) hits on a Wikipedia article describing helium as a method of suicide (2013-2014). We also investigated the extent to which helium was described as a method of suicide on web pages and discussion forums identified via Google. We found no evidence of rises in Internet searching about suicide using helium. News stories about helium suicides were associated with increased search activity. The Wikipedia article may have been temporarily altered to increase awareness of suicide using helium around the time of a celebrity suicide. Approximately one third of the links retrieved using Google searches for suicide methods mentioned helium. Information about helium as a suicide method is readily available on the Internet; the Wikipedia article describing its use was highly accessed following celebrity suicides. Availability of online information about this method may contribute to rises in helium suicides.
Generating PubMed Chemical Queries for Consumer Health Literature

PubMed Central

Loo, Jeffery; Chang, Hua Florence; Hochstein, Colette; Sun, Ying

2005-01-01

Two popular NLM resources that provide information for consumers about chemicals and their safety are the Household Products Database and Haz-Map. Search queries to PubMed via web links were generated from these databases. The query retrieves consumer health-oriented literature about adverse effects of chemicals. The retrieval was limited to a manageable set of 20 to 60 citations, achieved by successively applying increasing limits to the search until the desired number of references was reached. PMID:16779322
Parallel Index and Query for Large Scale Data Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chou, Jerry; Wu, Kesheng; Ruebel, Oliver

2011-07-18

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less
Tracking search engine queries for suicide in the United Kingdom, 2004-2013.

PubMed

Arora, V S; Stuckler, D; McKee, M

2016-08-01

First, to determine if a cyclical trend is observed for search activity of suicide and three common suicide risk factors in the United Kingdom: depression, unemployment, and marital strain. Second, to test the validity of suicide search data as a potential marker of suicide risk by evaluating whether web searches for suicide associate with suicide rates among those of different ages and genders in the United Kingdom. Cross-sectional. Search engine data was obtained from Google Trends, a publicly available repository of information of trends and patterns of user searches on Google. The following phrases were entered into Google Trends to analyse relative search volume for suicide, depression, job loss, and divorce, respectively: 'suicide'; 'depression + depressed + hopeless'; 'unemployed + lost job'; 'divorce'. Spearman's rank correlation coefficient was employed to test bivariate associations between suicide search activity and official suicide rates from the Office of National Statistics (ONS). Cyclical trends were observed in search activity for suicide and depression-related search activity, with peaks in autumn and winter months, and a trough in summer months. A positive, non-significant association was found between suicide-related search activity and suicide rates in the general working-age population (15-64 years) (ρ = 0.164; P = 0.652). This association is stronger in younger age groups, particularly for those 25-34 years of age (ρ = 0.848; P = 0.002). We give credence to a link between search activity for suicide and suicide rates in the United Kingdom from 2004 to 2013 for high risk sub-populations (i.e. male youth and young professionals). There remains a need for further research on how Google Trends can be used in other areas of disease surveillance and for work to provide greater geographical precision, as well as research on ways of mitigating the risk of internet use leading to suicide ideation in youth. Copyright © 2015 The Royal
Advanced SPARQL querying in small molecule databases.

PubMed

Galgonek, Jakub; Hurt, Tomáš; Michlíková, Vendula; Onderka, Petr; Schwarz, Jan; Vondrášek, Jiří

2016-01-01

In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF.
Evidence From Web-Based Dietary Search Patterns to the Role of B12 Deficiency in Non-Specific Chronic Pain: A Large-Scale Observational Study

PubMed Central

Giat, Eitan

2018-01-01

Background Profound vitamin B12 deficiency is a known cause of disease, but the role of low or intermediate levels of B12 in the development of neuropathy and other neuropsychiatric symptoms, as well as the relationship between eating meat and B12 levels, is unclear. Objective The objective of our study was to investigate the role of low or intermediate levels of B12 in the development of neuropathy and other neuropsychiatric symptoms. Methods We used food-related Internet search patterns from a sample of 8.5 million people based in the US as a proxy for B12 intake and correlated these searches with Internet searches related to possible effects of B12 deficiency. Results Food-related search patterns were highly correlated with known consumption and food-related searches (ρ=.69). Awareness of B12 deficiency was associated with a higher consumption of B12-rich foods and with queries for B12 supplements. Searches for terms related to neurological disorders were correlated with searches for B12-poor foods, in contrast with control terms. Popular medicines, those having fewer indications, and those which are predominantly used to treat pain, were more strongly correlated with the ability to predict neuropathic pain queries using the B12 contents of food. Conclusions Our findings show that Internet search patterns are a useful way of investigating health questions in large populations, and suggest that low B12 intake may be associated with a broader spectrum of neurological disorders than previously thought. PMID:29305340
Improving accuracy for identifying related PubMed queries by an integrated approach

PubMed Central

Lu, Zhiyong; Wilbur, W. John

2009-01-01

PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users’ search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1,539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1,396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments. PMID:19162232

How do Consumers Search for and Appraise Information on Medicines on the Internet? A Qualitative Study Using Focus Groups

PubMed Central

Aslani, Parisa; Williams, Kylie A

2003-01-01

Background Many consumers use the Internet to find information about their medicines. It is widely acknowledged that health information on the Internet is of variable quality and therefore the search and appraisal skills of consumers are important for selecting and assessing this information. The way consumers choose and evaluate information on medicines on the Internet is important because it has been shown that written information on medicines can influence consumer attitudes to and use of medicines. Objective To explore consumer experiences in searching for and appraising Internet-based information on medicines. Methods Six focus groups (N = 46 participants) were conducted in metropolitan Sydney, Australia from March to May 2003 with consumers who had used the Internet for information on medicines. Verbatim transcripts of the group discussions were analyzed using a grounded theory approach. Results All participants reported using a search engine to find information on medicines. Choice of search engine was determined by factors such as the workplace or educational environments, or suggestions by family or friends. Some participants found information solely by typing the medicine name (drug or brand name) into the search engine, while others searched using broader terms. Search skills ranged widely from more-advanced (using quotation marks and phrases) to less-than-optimal (such as typing in questions and full sentences). Many participants selected information from the first page of search results by looking for keywords and descriptions in the search results, and by looking for the source of the information as apparent in the URL. Opinions on credible sources of information on medicines varied with some participants regarding information by pharmaceutical companies as the "official" information on a medicine, and others preferring what they considered to be impartial sources such as governments, organizations, and educational institutions. It was clear that
Detecting Disease Outbreaks in Mass Gatherings Using Internet Data

PubMed Central

Yom-Tov, Elad; Cox, Ingemar J; McKendry, Rachel A

2014-01-01

Background Mass gatherings, such as music festivals and religious events, pose a health care challenge because of the risk of transmission of communicable diseases. This is exacerbated by the fact that participants disperse soon after the gathering, potentially spreading disease within their communities. The dispersion of participants also poses a challenge for traditional surveillance methods. The ubiquitous use of the Internet may enable the detection of disease outbreaks through analysis of data generated by users during events and shortly thereafter. Objective The intent of the study was to develop algorithms that can alert to possible outbreaks of communicable diseases from Internet data, specifically Twitter and search engine queries. Methods We extracted all Twitter postings and queries made to the Bing search engine by users who repeatedly mentioned one of nine major music festivals held in the United Kingdom and one religious event (the Hajj in Mecca) during 2012, for a period of 30 days and after each festival. We analyzed these data using three methods, two of which compared words associated with disease symptoms before and after the time of the festival, and one that compared the frequency of these words with those of other users in the United Kingdom in the days following the festivals. Results The data comprised, on average, 7.5 million tweets made by 12,163 users, and 32,143 queries made by 1756 users from each festival. Our methods indicated the statistically significant appearance of a disease symptom in two of the nine festivals. For example, cough was detected at higher than expected levels following the Wakestock festival. Statistically significant agreement (chi-square test, P<.01) between methods and across data sources was found where a statistically significant symptom was detected. Anecdotal evidence suggests that symptoms detected are indeed indicative of a disease that some users attributed to being at the festival. Conclusions Our work
Detecting disease outbreaks in mass gatherings using Internet data.

PubMed

Yom-Tov, Elad; Borsa, Diana; Cox, Ingemar J; McKendry, Rachel A

2014-06-18

Mass gatherings, such as music festivals and religious events, pose a health care challenge because of the risk of transmission of communicable diseases. This is exacerbated by the fact that participants disperse soon after the gathering, potentially spreading disease within their communities. The dispersion of participants also poses a challenge for traditional surveillance methods. The ubiquitous use of the Internet may enable the detection of disease outbreaks through analysis of data generated by users during events and shortly thereafter. The intent of the study was to develop algorithms that can alert to possible outbreaks of communicable diseases from Internet data, specifically Twitter and search engine queries. We extracted all Twitter postings and queries made to the Bing search engine by users who repeatedly mentioned one of nine major music festivals held in the United Kingdom and one religious event (the Hajj in Mecca) during 2012, for a period of 30 days and after each festival. We analyzed these data using three methods, two of which compared words associated with disease symptoms before and after the time of the festival, and one that compared the frequency of these words with those of other users in the United Kingdom in the days following the festivals. The data comprised, on average, 7.5 million tweets made by 12,163 users, and 32,143 queries made by 1756 users from each festival. Our methods indicated the statistically significant appearance of a disease symptom in two of the nine festivals. For example, cough was detected at higher than expected levels following the Wakestock festival. Statistically significant agreement (chi-square test, P<.01) between methods and across data sources was found where a statistically significant symptom was detected. Anecdotal evidence suggests that symptoms detected are indeed indicative of a disease that some users attributed to being at the festival. Our work shows the feasibility of creating a public health
Query Auto-Completion Based on Word2vec Semantic Similarity

NASA Astrophysics Data System (ADS)

Shao, Taihua; Chen, Honghui; Chen, Wanyu

2018-04-01

Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.
Persistent Identifiers for Improved Accessibility for Linked Data Querying

NASA Astrophysics Data System (ADS)

Shepherd, A.; Chandler, C. L.; Arko, R. A.; Fils, D.; Jones, M. B.; Krisnadhi, A.; Mecum, B.

2016-12-01

The adoption of linked open data principles within the geosciences has increased the amount of accessible information available on the Web. However, this data is difficult to consume for those who are unfamiliar with Semantic Web technologies such as Web Ontology Language (OWL), Resource Description Framework (RDF) and SPARQL - the RDF query language. Consumers would need to understand the structure of the data and how to efficiently query it. Furthermore, understanding how to query doesn't solve problems of poor precision and recall in search results. For consumers unfamiliar with the data, full-text searches are most accessible, but not ideal as they arrest the advantages of data disambiguation and co-reference resolution efforts. Conversely, URI searches across linked data can deliver improved search results, but knowledge of these exact URIs may remain difficult to obtain. The increased adoption of Persistent Identifiers (PIDs) can lead to improved linked data querying by a wide variety of consumers. Because PIDs resolve to a single entity, they are an excellent data point for disambiguating content. At the same time, PIDs are more accessible and prominent than a single data provider's linked data URI. When present in linked open datasets, PIDs provide balance between the technical and social hurdles of linked data querying as evidenced by the NSF EarthCube GeoLink project. The GeoLink project, funded by NSF's EarthCube initiative, have brought together data repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span scientific studies from marine geology to marine ecosystems and biogeochemistry to paleoclimatology.
Development of Conceptual Models for Internet Search: A Case Study.

ERIC Educational Resources Information Center

Uden, Lorna; Tearne, Stephen; Alderson, Albert

This paper describes the creation and evaluation of a World Wide Web-based courseware module, using conceptual models based on constructivism, that teaches novices how to use the Internet for searching. Questionnaires and interviews were used to understand the difficulties of a group of novices. The conceptual model of the experts for the task was…
The Effect of Limited Health Literacy on How Internet Users Learn About Diabetes.

PubMed

Yom-Tov, Elad; Marino, Barbara; Pai, Jennifer; Harris, Dawn; Wolf, Michael

2016-10-01

The Internet continues to be an important supplemental health information resource for an increasing number of U.S. adults, especially for those with a new or existing chronic condition. Here we examine how people use the Internet to learn about Type 2 diabetes and how health literacy (HL) influences this information-seeking behavior. We analyzed the searches of approximately 2 million people who queried for diabetes-related information on Microsoft's Bing search engine. The HL of searchers was imputed through a community-based HL score. Topics searched were categorized and subsequent websites were assessed for readability. Overall, diabetes information-seeking strategies via the Internet are similar among adults with limited and adequate HL skills. However, people with limited HL take a longer time to read pages that are quickly read by people with adequate HL and vice versa. Information seeking among the former is terminated prematurely, as is evident from a Hidden Markov Model of the search process. Our findings indicate that the reading level required to understand the majority of diabetes-related information is high. Especially on government websites, more than 80% of information requires a reading level corresponding to 7th grade or higher. Our results indicate that individuals with lower HL may disproportionately struggle with Internet searches and fail to get an equivalent benefit from this information resource compared to users with greater HL. Future interventions should target the quality and ease of navigation of health care websites and find ways to leverage other relevant professionals to encourage and promote successful information access on the Web.
Internet-ordered viagra (sildenafil citrate) is rarely genuine.

PubMed

Campbell, Neil; Clark, John P; Stecher, Vera J; Goldstein, Irwin

2012-11-01

Counterfeit medication is a growing problem. This study assessed the requirement for prescription, cost, origin, and content of medications sold via the Internet and purporting to be the phosphodiesterase type 5 inhibitor Viagra (sildenafil citrate). Pfizer monitored top search results for the query "buy Viagra" on the two leading Internet search engines in March 2011. Orders were placed from 22 unique Web sites claiming to sell Viagra manufactured by Pfizer. Tablets received were assessed for chemical composition. No Web site examined required a prescription for purchase or a health screening survey; 90% offered illegal "generic Viagra." Cost per tablet ranged from $3.28-$33.00. Shipment origins of purchases were Hong Kong (N = 11), the United States (N = 6), and the United Kingdom (N = 2) as well as Canada, China, and India (N = 1 each). Notably, the four Internet pharmacies claiming to be Canadian did not ship medication from a Canadian address. Of 22 sample tablets examined, 17 (77%) were counterfeit, 4 (18%) were authentic, and 1 (5%) was an illegal generic. Counterfeit tablets were analyzed for sildenafil citrate, the active pharmaceutical ingredient (API) of Viagra, and contents varied between 30% and 50% of the label claim. Counterfeits lacked product information leaflets, including appropriate safety warnings, and genuine Viagra formulations. Internet sites claiming to sell authentic Viagra shipped counterfeit medication 77% of the time; counterfeits usually came from non-U.S. addresses and had 30% to 50% of the labeled API claim. Caution is warranted when purchasing Viagra via the Internet. © 2012 International Society for Sexual Medicine.
Multimedia Web Searching Trends.

ERIC Educational Resources Information Center

Ozmutlu, Seda; Spink, Amanda; Ozmutlu, H. Cenk

2002-01-01

Examines and compares multimedia Web searching by Excite and FAST search engine users in 2001. Highlights include audio and video queries; time spent on searches; terms per query; ranking of the most frequently used terms; and differences in Web search behaviors of U.S. and European Web users. (Author/LRW)
Web page sorting algorithm based on query keyword distance relation

NASA Astrophysics Data System (ADS)

Yang, Han; Cui, Hong Gang; Tang, Hao

2017-08-01

In order to optimize the problem of page sorting, according to the search keywords in the web page in the relationship between the characteristics of the proposed query keywords clustering ideas. And it is converted into the degree of aggregation of the search keywords in the web page. Based on the PageRank algorithm, the clustering degree factor of the query keyword is added to make it possible to participate in the quantitative calculation. This paper proposes an improved algorithm for PageRank based on the distance relation between search keywords. The experimental results show the feasibility and effectiveness of the method.
Query-Based Outlier Detection in Heterogeneous Information Networks.

PubMed

Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

2015-03-01

Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.
Query-Based Outlier Detection in Heterogeneous Information Networks

PubMed Central

Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

2015-01-01

Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397
Searching Choices: Quantifying Decision-Making Processes Using Search Engine Data.

PubMed

Moat, Helen Susannah; Olivola, Christopher Y; Chater, Nick; Preis, Tobias

2016-07-01

When making a decision, humans consider two types of information: information they have acquired through their prior experience of the world, and further information they gather to support the decision in question. Here, we present evidence that data from search engines such as Google can help us model both sources of information. We show that statistics from search engines on the frequency of content on the Internet can help us estimate the statistical structure of prior experience; and, specifically, we outline how such statistics can inform psychological theories concerning the valuation of human lives, or choices involving delayed outcomes. Turning to information gathering, we show that search query data might help measure human information gathering, and it may predict subsequent decisions. Such data enable us to compare information gathered across nations, where analyses suggest, for example, a greater focus on the future in countries with a higher per capita GDP. We conclude that search engine data constitute a valuable new resource for cognitive scientists, offering a fascinating new tool for understanding the human decision-making process. Copyright © 2016 The Authors. Topics in Cognitive Science published by Wiley Periodicals, Inc. on behalf of Cognitive Science Society.
Boolean logic tree of graphene-based chemical system for molecular computation and intelligent molecular search query.

PubMed

Huang, Wei Tao; Luo, Hong Qun; Li, Nian Bing

2014-05-06

The most serious, and yet unsolved, problem of constructing molecular computing devices consists in connecting all of these molecular events into a usable device. This report demonstrates the use of Boolean logic tree for analyzing the chemical event network based on graphene, organic dye, thrombin aptamer, and Fenton reaction, organizing and connecting these basic chemical events. And this chemical event network can be utilized to implement fluorescent combinatorial logic (including basic logic gates and complex integrated logic circuits) and fuzzy logic computing. On the basis of the Boolean logic tree analysis and logic computing, these basic chemical events can be considered as programmable "words" and chemical interactions as "syntax" logic rules to construct molecular search engine for performing intelligent molecular search query. Our approach is helpful in developing the advanced logic program based on molecules for application in biosensing, nanotechnology, and drug delivery.
Using Bitmap Indexing Technology for Combined Numerical and TextQueries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stockinger, Kurt; Cieslewicz, John; Wu, Kesheng

2006-10-16

In this paper, we describe a strategy of using compressedbitmap indices to speed up queries on both numerical data and textdocuments. By using an efficient compression algorithm, these compressedbitmap indices are compact even for indices with millions of distinctterms. Moreover, bitmap indices can be used very efficiently to answerBoolean queries over text documents involving multiple query terms.Existing inverted indices for text searches are usually inefficient forcorpora with a very large number of terms as well as for queriesinvolving a large number of hits. We demonstrate that our compressedbitmap index technology overcomes both of those short-comings. In aperformance comparison against amore » commonly used database system, ourindices answer queries 30 times faster on average. To provide full SQLsupport, we integrated our indexing software, called FastBit, withMonetDB. The integrated system MonetDB/FastBit provides not onlyefficient searches on a single table as FastBit does, but also answersjoin queries efficiently. Furthermore, MonetDB/FastBit also provides avery efficient retrieval mechanism of result records.« less
New Internet search volume-based weighting method for integrating various environmental impacts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ji, Changyoon, E-mail: changyoon@yonsei.ac.kr; Hong, Taehoon, E-mail: hong7@yonsei.ac.kr

Weighting is one of the steps in life cycle impact assessment that integrates various characterized environmental impacts as a single index. Weighting factors should be based on the society's preferences. However, most previous studies consider only the opinion of some people. Thus, this research proposes a new weighting method that determines the weighting factors of environmental impact categories by considering public opinion on environmental impacts using the Internet search volumes for relevant terms. To validate the new weighting method, the weighting factors for six environmental impacts calculated by the new weighting method were compared with the existing weighting factors. Themore » resulting Pearson's correlation coefficient between the new and existing weighting factors was from 0.8743 to 0.9889. It turned out that the new weighting method presents reasonable weighting factors. It also requires less time and lower cost compared to existing methods and likewise meets the main requirements of weighting methods such as simplicity, transparency, and reproducibility. The new weighting method is expected to be a good alternative for determining the weighting factor. - Highlight: • A new weighting method using Internet search volume is proposed in this research. • The new weighting method reflects the public opinion using Internet search volume. • The correlation coefficient between new and existing weighting factors is over 0.87. • The new weighting method can present the reasonable weighting factors. • The proposed method can be a good alternative for determining the weighting factors.« less
The Internet and education in surgery.

PubMed

Veldenz, H C; Dennis, J W

1998-09-01

The purpose of this review is to explain the developing role of the Internet and the World Wide Web (WWW) in promoting education in surgery. Internet sites relevant to surgery are appearing rapidly. Remote literature searches can query for surgery trials and results. Societies are using the WWW for transmission and review of publication materials. News groups interactively discuss current developments and trends. Surgeons are using personal and institutional sites to advertise services. Conventional slide shows migrate to the WWW for convenient downloading for surgeons and patients. Multimedia capabilities of the WWW expand the depth of information transmission, enabling education emanating from remote sites with narration and video depiction of procedures. These sophisticated tools can be demonstrated today with real online applications. One site facilitates surgical education using the WWW for program information, symposium coordination, links to regional subspecialty societies, residency cataloging, patient question and answer forums, and multimedia procedure descriptions. The principles of WWW communication used in this website can adapt to meet any educational need. The specialty of surgery is well suited to incorporation of online multimedia education over the Internet to follow new developments in our field.
Fragger: a protein fragment picker for structural queries.

PubMed

Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

2017-01-01

Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.
Spatial Query for Planetary Data

NASA Technical Reports Server (NTRS)

Shams, Khawaja S.; Crockett, Thomas M.; Powell, Mark W.; Joswig, Joseph C.; Fox, Jason M.

2011-01-01

Science investigators need to quickly and effectively assess past observations of specific locations on a planetary surface. This innovation involves a location-based search technology that was adapted and applied to planetary science data to support a spatial query capability for mission operations software. High-performance location-based searching requires the use of spatial data structures for database organization. Spatial data structures are designed to organize datasets based on their coordinates in a way that is optimized for location-based retrieval. The particular spatial data structure that was adapted for planetary data search is the R+ tree.
Classification of Automated Search Traffic

NASA Astrophysics Data System (ADS)

Buehrer, Greg; Stokes, Jack W.; Chellapilla, Kumar; Platt, John C.

As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.

Virtual Solar Observatory Distributed Query Construction

NASA Technical Reports Server (NTRS)

Gurman, J. B.; Dimitoglou, G.; Bogart, R.; Davey, A.; Hill, F.; Martens, P.

2003-01-01

Through a prototype implementation (Tian et al., this meeting) the VSO has already demonstrated the capability of unifying geographically distributed data sources following the Web Services paradigm and utilizing mechanisms such as the Simple Object Access Protocol (SOAP). So far, four participating sites (Stanford, Montana State University, National Solar Observatory and the Solar Data Analysis Center) permit Web-accessible, time-based searches that allow browse access to a number of diverse data sets. Our latest work includes the extension of the simple, time-based queries to include numerous other searchable observation parameters. For VSO users, this extended functionality enables more refined searches. For the VSO, it is a proof of concept that more complex, distributed queries can be effectively constructed and that results from heterogeneous, remote sources can be synthesized and presented to users as a single, virtual data product.
Detecting internet search activity for mouth cancer in Ireland.

PubMed

Murray, G; O'Rourke, C; Hogan, J; Fenton, J E

2016-02-01

Mouth Cancer Awareness Day in Ireland was launched in September 2010 by survivors of the disease to promote public awareness of suspicious signs of oral cancer and to provide free dental examinations. To find out whether its introduction had increased public interest in the disease, we used Google Trends to find out how often users in Ireland had searched for "oral cancer" and "mouth cancer" across all Google domains between January 2005 and December 2013. The number of internet searches for these cancers has increased significantly (p <0.001) and has peaked each September since the awareness day was launched in 2010. More people searched for "mouth cancer" than for "oral cancer". These findings may have valuable clinical implications, as an increase in public awareness of mouth cancer could result in earlier presentation and better prognosis. Copyright © 2015 The British Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.
The Search for Extension: 7 Steps to Help People Find Research-Based Information on the Internet

ERIC Educational Resources Information Center

Hill, Paul; Rader, Heidi B.; Hino, Jeff

2012-01-01

For Extension's unbiased, research-based content to be found by people searching the Internet, it needs to be organized in a way conducive to the ranking criteria of a search engine. With proper web design and search engine optimization techniques, Extension's content can be found, recognized, and properly indexed by search engines and…
FTree query construction for virtual screening: a statistical analysis.

PubMed

Gerlach, Christof; Broughton, Howard; Zaliani, Andrea

2008-02-01

FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.
FTree query construction for virtual screening: a statistical analysis

NASA Astrophysics Data System (ADS)

Gerlach, Christof; Broughton, Howard; Zaliani, Andrea

2008-02-01

FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.
Retrieving high-resolution images over the Internet from an anatomical image database

NASA Astrophysics Data System (ADS)

Strupp-Adams, Annette; Henderson, Earl

1999-12-01

The Visible Human Data set is an important contribution to the national collection of anatomical images. To enhance the availability of these images, the National Library of Medicine has supported the design and development of a prototype object-oriented image database which imports, stores, and distributes high resolution anatomical images in both pixel and voxel formats. One of the key database modules is its client-server Internet interface. This Web interface provides a query engine with retrieval access to high-resolution anatomical images that range in size from 100KB for browser viewable rendered images, to 1GB for anatomical structures in voxel file formats. The Web query and retrieval client-server system is composed of applet GUIs, servlets, and RMI application modules which communicate with each other to allow users to query for specific anatomical structures, and retrieve image data as well as associated anatomical images from the database. Selected images can be downloaded individually as single files via HTTP or downloaded in batch-mode over the Internet to the user's machine through an applet that uses Netscape's Object Signing mechanism. The image database uses ObjectDesign's object-oriented DBMS, ObjectStore that has a Java interface. The query and retrieval systems has been tested with a Java-CDE window system, and on the x86 architecture using Windows NT 4.0. This paper describes the Java applet client search engine that queries the database; the Java client module that enables users to view anatomical images online; the Java application server interface to the database which organizes data returned to the user, and its distribution engine that allow users to download image files individually and/or in batch-mode.
A New Archive and Internet Search Engine May Change the Nature of On-Line Research.

ERIC Educational Resources Information Center

Selingo, Jeffrey

1998-01-01

In the process of trying to preserve Internet history by archiving it, a company has developed a powerful Internet search engine that provides information on Web site usage patterns, which can act as a relatively objective source of information about information sources and can link sources that a researcher might otherwise miss. However, issues…
Using Internet Search Engines to Obtain Medical Information: A Comparative Study

PubMed Central

Wang, Liupu; Wang, Juexin; Wang, Michael; Li, Yong; Liang, Yanchun

2012-01-01

Background The Internet has become one of the most important means to obtain health and medical information. It is often the first step in checking for basic information about a disease and its treatment. The search results are often useful to general users. Various search engines such as Google, Yahoo!, Bing, and Ask.com can play an important role in obtaining medical information for both medical professionals and lay people. However, the usability and effectiveness of various search engines for medical information have not been comprehensively compared and evaluated. Objective To compare major Internet search engines in their usability of obtaining medical and health information. Methods We applied usability testing as a software engineering technique and a standard industry practice to compare the four major search engines (Google, Yahoo!, Bing, and Ask.com) in obtaining health and medical information. For this purpose, we searched the keyword breast cancer in Google, Yahoo!, Bing, and Ask.com and saved the results of the top 200 links from each search engine. We combined nonredundant links from the four search engines and gave them to volunteer users in an alphabetical order. The volunteer users evaluated the websites and scored each website from 0 to 10 (lowest to highest) based on the usefulness of the content relevant to breast cancer. A medical expert identified six well-known websites related to breast cancer in advance as standards. We also used five keywords associated with breast cancer defined in the latest release of Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and analyzed their occurrence in the websites. Results Each search engine provided rich information related to breast cancer in the search results. All six standard websites were among the top 30 in search results of all four search engines. Google had the best search validity (in terms of whether a website could be opened), followed by Bing, Ask.com, and Yahoo!. The search
Using Internet search engines to obtain medical information: a comparative study.

PubMed

Wang, Liupu; Wang, Juexin; Wang, Michael; Li, Yong; Liang, Yanchun; Xu, Dong

2012-05-16

The Internet has become one of the most important means to obtain health and medical information. It is often the first step in checking for basic information about a disease and its treatment. The search results are often useful to general users. Various search engines such as Google, Yahoo!, Bing, and Ask.com can play an important role in obtaining medical information for both medical professionals and lay people. However, the usability and effectiveness of various search engines for medical information have not been comprehensively compared and evaluated. To compare major Internet search engines in their usability of obtaining medical and health information. We applied usability testing as a software engineering technique and a standard industry practice to compare the four major search engines (Google, Yahoo!, Bing, and Ask.com) in obtaining health and medical information. For this purpose, we searched the keyword breast cancer in Google, Yahoo!, Bing, and Ask.com and saved the results of the top 200 links from each search engine. We combined nonredundant links from the four search engines and gave them to volunteer users in an alphabetical order. The volunteer users evaluated the websites and scored each website from 0 to 10 (lowest to highest) based on the usefulness of the content relevant to breast cancer. A medical expert identified six well-known websites related to breast cancer in advance as standards. We also used five keywords associated with breast cancer defined in the latest release of Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and analyzed their occurrence in the websites. Each search engine provided rich information related to breast cancer in the search results. All six standard websites were among the top 30 in search results of all four search engines. Google had the best search validity (in terms of whether a website could be opened), followed by Bing, Ask.com, and Yahoo!. The search results highly overlapped between the
Enabling Incremental Query Re-Optimization.

PubMed

Liu, Mengmeng; Ives, Zachary G; Loo, Boon Thau

2016-01-01

As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs , and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries ; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations.
Enabling Incremental Query Re-Optimization

PubMed Central

Liu, Mengmeng; Ives, Zachary G.; Loo, Boon Thau

2017-01-01

As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs, and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations. PMID:28659658
'Doctor Google' ending the diagnostic odyssey in lysosomal storage disorders: parents using internet search engines as an efficient diagnostic strategy in rare diseases.

PubMed

Bouwman, Machtelt G; Teunissen, Quirine G A; Wijburg, Frits A; Linthorst, Gabor E

2010-08-01

The expansion of the internet has resulted in widespread availability of medical information for both patients and physicians. People increasingly spend time on the internet searching for an explanation, diagnosis or treatment for their symptoms. Regarding rare diseases, the use of the internet may be an important tool in the diagnostic process. The authors present two cases in which concerned parents made a correct diagnosis of a lysosomal storage disorder in their child by searching the internet after a long doctor's delay. These cases illustrate the utility of publicly available internet search engines in diagnosing rare disorders and in addition illustrate the lengthy diagnostic odyssey which is common in these disorders.
BioCarian: search engine for exploratory searches in heterogeneous biological databases.

PubMed

Zaki, Nazar; Tennakoon, Chandana

2017-10-02

There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search
What do people search online concerning the "elusive" fibromyalgia? Insights from a qualitative and quantitative analysis of Google Trends.

PubMed

Bragazzi, Nicola Luigi; Amital, Howard; Adawi, Mohammad; Brigo, Francesco; Watad, Samaa; Aljadeff, Gali; Amital, Daniela; Watad, Abdulla

2017-08-01

Fibromyalgia is a chronic disease, characterized by pain, fatigue, and poor sleep quality. Patients and mainly those with chronic diseases tend to search for health-related material online. Google Trends (GT), an online tracking system of Internet hit-search volumes that recently merged with its sister project Google Insights for Search (Google Inc.), was used to explore Internet activity related to fibromyalgia. Digital interest in fibromyalgia and related topics searched worldwide has been reported in the last 13 years. A slight decline in this interest has been observed through the years, remaining stable in the last 5 years. Fibromyalgia web behavior exhibited a regular, cyclic pattern, even though no seasonality could be detected. Similar findings have been reported among rheumatoid arthritis and depression. However, differently from rheumatoid arthritis and depression, the focus of the fibromyalgia-related queries was more concentrated on drug side effects and the "elusive" nature of fibromyalgia: is it a real or imaginary condition? Does it really exist or is it all in your head? A tremendous amount of information on fibromyalgia and related topics exist online. Still many queries have been raised and repeated constantly by fibromyalgia patients in the last 13 years. Therefore, physicians should be aware of the common concerns of people or patients regarding fibromyalgia in order to give a proper answers and education.
TokSearch: A search engine for fusion experimental data

DOE PAGES

Sammuli, Brian S.; Barr, Jayson L.; Eidietis, Nicholas W.; ...

2018-04-01

At a typical fusion research site, experimental data is stored using archive technologies that deal with each discharge as an independent set of data. These technologies (e.g. MDSplus or HDF5) are typically supplemented with a database that aggregates metadata for multiple shots to allow for efficient querying of certain predefined quantities. Often, however, a researcher will need to extract information from the archives, possibly for many shots, that is not available in the metadata store or otherwise indexed for quick retrieval. To address this need, a new search tool called TokSearch has been added to the General Atomics TokSys controlmore » design and analysis suite [1]. This tool provides the ability to rapidly perform arbitrary, parallelized queries of archived tokamak shot data (both raw and analyzed) over large numbers of shots. The TokSearch query API borrows concepts from SQL, and users can choose to implement queries in either MatlabTM or Python.« less
TokSearch: A search engine for fusion experimental data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sammuli, Brian S.; Barr, Jayson L.; Eidietis, Nicholas W.

At a typical fusion research site, experimental data is stored using archive technologies that deal with each discharge as an independent set of data. These technologies (e.g. MDSplus or HDF5) are typically supplemented with a database that aggregates metadata for multiple shots to allow for efficient querying of certain predefined quantities. Often, however, a researcher will need to extract information from the archives, possibly for many shots, that is not available in the metadata store or otherwise indexed for quick retrieval. To address this need, a new search tool called TokSearch has been added to the General Atomics TokSys controlmore » design and analysis suite [1]. This tool provides the ability to rapidly perform arbitrary, parallelized queries of archived tokamak shot data (both raw and analyzed) over large numbers of shots. The TokSearch query API borrows concepts from SQL, and users can choose to implement queries in either MatlabTM or Python.« less
On Relevance Weight Estimation and Query Expansion.

ERIC Educational Resources Information Center

Robertson, S. E.

1986-01-01

A Bayesian argument is used to suggest modifications to the Robertson and Jones relevance weighting formula to accommodate the addition to the query of terms taken from the relevant documents identified during the search. (Author)
Seasonal variation in internet keyword searches: a proxy assessment of sex mating behaviors.

PubMed

Markey, Patrick M; Markey, Charlotte N

2013-05-01

The current study investigated seasonal variation in internet searches regarding sex and mating behaviors. Harmonic analyses were used to examine the seasonal trends of Google keyword searches during the past 5 years for topics related to pornography, prostitution, and mate-seeking. Results indicated a consistent 6-month harmonic cycle with the peaks of keyword searches related to sex and mating behaviors occurring most frequently during winter and early summer. Such results compliment past research that has found similar seasonal trends of births, sexually transmitted infections, condom sales, and abortions.
The effect of search condition and advertising type on visual attention to Internet advertising.

PubMed

Kim, Gho; Lee, Jang-Han

2011-05-01

This research was conducted to examine the level of consumers' visual attention to Internet advertising. It was predicted that consumers' search type would influence visual attention to advertising. Specifically, it was predicted that more attention to advertising would be attracted in the exploratory search condition than in the goal-directed search condition. It was also predicted that there would be a difference in visual attention depending on the advertisement type (advertising type: text vs. pictorial advertising). An eye tracker was used for measurement. Results revealed that search condition and advertising type influenced advertising effectiveness.
Using Internet Search Data to Produce State-Level Measures: The Case of Tea Party Mobilization

ERIC Educational Resources Information Center

DiGrazia, Joseph

2017-01-01

This study proposes using Internet search data from search engines like Google to produce state-level metrics that are useful in social science research. Generally, state-level research relies on demographic statistics, official statistics produced by government agencies, or aggregated survey data. However, each of these data sources has serious…

Query Language for Location-Based Services: A Model Checking Approach

NASA Astrophysics Data System (ADS)

Hoareau, Christian; Satoh, Ichiro

We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.
Visual graph query formulation and exploration: a new perspective on information retrieval at the edge

NASA Astrophysics Data System (ADS)

Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng

2016-05-01

Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.
Impact of Internet Search Engines on OPAC Users: A Study of Punjabi University, Patiala (India)

ERIC Educational Resources Information Center

Kumar, Shiv

2012-01-01

Purpose: The aim of this paper is to study the impact of internet search engine usage with special reference to OPAC searches in the Punjabi University Library, Patiala, Punjab (India). Design/methodology/approach: The primary data were collected from 352 users comprising faculty, research scholars and postgraduate students of the university. A…
Introducing Internet Services.

ERIC Educational Resources Information Center

Diaz, Karen R.; And Others

1994-01-01

Four articles describe the usefulness of the Internet to reference librarians and discuss Internet search tools: "Getting Started on the Net" (Karen R. Diaz); "Gopher Searching Using VERONICA" (Lousie McGillis); "How to Use VERONICA To Find Information on the Internet" (Jackie Mardikian); and "The Internet…
Public hospital quality report awareness: evidence from National and Californian Internet searches and social media mentions, 2012

PubMed Central

Huesch, Marco D; Currid-Halkett, Elizabeth; Doctor, Jason N

2014-01-01

Objectives Publicly available hospital quality reports seek to inform consumers of important healthcare quality and affordability attributes, and may inform consumer decision-making. To understand how much consumers search for such information online on one Internet search engine, whether they mention such information in social media and how positively they view this information. Setting and design A leading Internet search engine (Google) was the main focus of the study. Google Trends and Google Adwords keyword analyses were performed for national and Californian searches between 1 August 2012 and 31 July 2013 for keywords related to ‘top hospital’, best hospital’, and ‘hospital quality’, as well as for six specific hospital quality reports. Separately, a proprietary social media monitoring tool was used to investigate blog, forum, social media and traditional media mentions of, and sentiment towards, major public reports of hospital quality in California in 2012. Primary outcome measures (1) Counts of searches for keywords performed on Google; (2) counts of and (3) sentiment of mentions of public reports on social media. Results National Google search volume for 75 hospital quality-related terms averaged 610 700 searches per month with strong variation by keyword and by state. A commercial report (Healthgrades) was more commonly searched for nationally on Google than the federal government's Hospital Compare, which otherwise dominated quality-related search terms. Social media references in California to quality reports were generally few, and commercially produced hospital quality reports were more widely mentioned than state (Office of Statewide Healthcare Planning and Development (OSHPD)), or non-profit (CalHospitalCompare) reports. Conclusions Consumers are somewhat aware of hospital quality based on Internet search activity and social media disclosures. Public stakeholders may be able to broaden their quality dissemination initiatives by
Public hospital quality report awareness: evidence from National and Californian Internet searches and social media mentions, 2012.

PubMed

Huesch, Marco D; Currid-Halkett, Elizabeth; Doctor, Jason N

2014-03-11

Publicly available hospital quality reports seek to inform consumers of important healthcare quality and affordability attributes, and may inform consumer decision-making. To understand how much consumers search for such information online on one Internet search engine, whether they mention such information in social media and how positively they view this information. A leading Internet search engine (Google) was the main focus of the study. Google Trends and Google Adwords keyword analyses were performed for national and Californian searches between 1 August 2012 and 31 July 2013 for keywords related to 'top hospital', best hospital', and 'hospital quality', as well as for six specific hospital quality reports. Separately, a proprietary social media monitoring tool was used to investigate blog, forum, social media and traditional media mentions of, and sentiment towards, major public reports of hospital quality in California in 2012. (1) Counts of searches for keywords performed on Google; (2) counts of and (3) sentiment of mentions of public reports on social media. National Google search volume for 75 hospital quality-related terms averaged 610 700 searches per month with strong variation by keyword and by state. A commercial report (Healthgrades) was more commonly searched for nationally on Google than the federal government's Hospital Compare, which otherwise dominated quality-related search terms. Social media references in California to quality reports were generally few, and commercially produced hospital quality reports were more widely mentioned than state (Office of Statewide Healthcare Planning and Development (OSHPD)), or non-profit (CalHospitalCompare) reports. Consumers are somewhat aware of hospital quality based on Internet search activity and social media disclosures. Public stakeholders may be able to broaden their quality dissemination initiatives by advertising on Google or Twitter and using social media interactively with consumers looking
Searching for truth: internet search patterns as a method of investigating online responses to a Russian illicit drug policy debate.

PubMed

Zheluk, Andrey; Gillespie, James A; Quinn, Casey

2012-12-13

This is a methodological study investigating the online responses to a national debate over an important health and social problem in Russia. Russia is the largest Internet market in Europe, exceeding Germany in the absolute number of users. However, Russia is unusual in that the main search provider is not Google, but Yandex. This study had two main objectives. First, to validate Yandex search patterns against those provided by Google, and second, to test this method's adequacy for investigating online interest in a 2010 national debate over Russian illicit drug policy. We hoped to learn what search patterns and specific search terms could reveal about the relative importance and geographic distribution of interest in this debate. A national drug debate, centering on the anti-drug campaigner Egor Bychkov, was one of the main Russian domestic news events of 2010. Public interest in this episode was accompanied by increased Internet search. First, we measured the search patterns for 13 search terms related to the Bychkov episode and concurrent domestic events by extracting data from Google Insights for Search (GIFS) and Yandex WordStat (YaW). We conducted Spearman Rank Correlation of GIFS and YaW search data series. Second, we coded all 420 primary posts from Bychkov's personal blog between March 2010 and March 2012 to identify the main themes. Third, we compared GIFS and Yandex policies concerning the public release of search volume data. Finally, we established the relationship between salient drug issues and the Bychkov episode. We found a consistent pattern of strong to moderate positive correlations between Google and Yandex for the terms "Egor Bychkov" (r(s) = 0.88, P < .001), "Bychkov" (r(s) = .78, P < .001) and "Khimki"(r(s) = 0.92, P < .001). Peak search volumes for the Bychkov episode were comparable to other prominent domestic political events during 2010. Monthly search counts were 146,689 for "Bychkov" and 48,084 for "Egor Bychkov", compared to 53
Searching the Internet for information on prostate cancer screening: an assessment of quality.

PubMed

Ilic, Dragan; Risbridger, Gail; Green, Sally

2004-07-01

To identify how on-line information relating to prostate cancer screening (PCS) is best sourced, whether through general, medical, or meta-search engines, and to assess the quality of that information. Websites providing information about PCS were searched across 15 search engines representing three distinct types: general, medical, and meta-search engines. The quality of on-line information was assessed using the DISCERN quality assessment tool. Quality performance characteristics were analyzed by performing Mann-Whitney U tests. Search engine efficiency was measured by each search query as a percentage of the relevant websites included for analysis from the total returned and analyzed by performing Kruskal-Wallis analysis of variance. Of 6690 websites reviewed, 84 unique websites were identified as providing information relevant to PCS. General and meta-search engines were significantly more efficient at retrieving relevant information on PCS compared with medical search engines. The quality of information was variable, with most of a poor standard. Websites that provided referral links to other resources and a citation of evidence provided a significantly better quality of information. In contrast, websites offering a direct service were more likely to provide a significantly poorer quality of information. The current lack of a clear consensus on guidelines and recommendation in published data is also reflected by the variable quality of information found on-line. Specialized medical search engines were no more likely to retrieve relevant, high-quality information than general or meta-search engines.
Fuzzy queries above relational database

NASA Astrophysics Data System (ADS)

Smolka, Pavel; Bradac, Vladimir

2017-11-01

The aim of the theme is to introduce a possibility of fuzzy queries implemented in relational databases. The issue is described on a model which identifies the appropriate part of the problem domain for fuzzy approach. The model is demonstrated on a database of wines focused on searching in it. The construction of the database complies with the Law of the Czech Republic.
Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters.

PubMed

Lokker, Cynthia; Haynes, R Brian; Wilczynski, Nancy L; McKibbon, K Ann; Walter, Stephen D

2011-01-01

Clinical Queries filters were developed to improve the retrieval of high-quality studies in searches on clinical matters. The study objective was to determine the yield of relevant citations and physician satisfaction while searching for diagnostic and treatment studies using the Clinical Queries page of PubMed compared with searching PubMed without these filters. Forty practicing physicians, presented with standardized treatment and diagnosis questions and one question of their choosing, entered search terms which were processed in a random, blinded fashion through PubMed alone and PubMed Clinical Queries. Participants rated search retrievals for applicability to the question at hand and satisfaction. For treatment, the primary outcome of retrieval of relevant articles was not significantly different between the groups, but a higher proportion of articles from the Clinical Queries searches met methodologic criteria (p=0.049), and more articles were published in core internal medicine journals (p=0.056). For diagnosis, the filtered results returned more relevant articles (p=0.031) and fewer irrelevant articles (overall retrieval less, p=0.023); participants needed to screen fewer articles before arriving at the first relevant citation (p<0.05). Relevance was also influenced by content terms used by participants in searching. Participants varied greatly in their search performance. Clinical Queries filtered searches returned more high-quality studies, though the retrieval of relevant articles was only statistically different between the groups for diagnosis questions. Retrieving clinically important research studies from Medline is a challenging task for physicians. Methodological search filters can improve search retrieval.
Ad-Hoc Queries over Document Collections - A Case Study

NASA Astrophysics Data System (ADS)

Löser, Alexander; Lutter, Steffen; Düssel, Patrick; Markl, Volker

We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000's of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. "Google Squared" or our system GOOLAP.info, are examples of these kinds of systems. They execute information extraction methods over one or several document collections at query time and integrate extracted records into a common view or tabular structure. Frequent extraction and object resolution failures cause incomplete records which could not be joined into a record answering the query. Our focus is the identification of join-reordering heuristics maximizing the size of complete records answering a structured query. With respect to given costs for document extraction we propose two novel join-operations: The multi-way CJ-operator joins records from multiple relationships extracted from a single document. The two-way join-operator DJ ensures data density by removing incomplete records from results. In a preliminary case study we observe that our join-reordering heuristics positively impact result size, record density and lower execution costs.
Clinician search behaviors may be influenced by search engine design.

PubMed

Lau, Annie Y S; Coiera, Enrico; Zrimec, Tatjana; Compton, Paul

2010-06-30

Searching the Web for documents using information retrieval systems plays an important part in clinicians' practice of evidence-based medicine. While much research focuses on the design of methods to retrieve documents, there has been little examination of the way different search engine capabilities influence clinician search behaviors. Previous studies have shown that use of task-based search engines allows for faster searches with no loss of decision accuracy compared with resource-based engines. We hypothesized that changes in search behaviors may explain these differences. In all, 75 clinicians (44 doctors and 31 clinical nurse consultants) were randomized to use either a resource-based or a task-based version of a clinical information retrieval system to answer questions about 8 clinical scenarios in a controlled setting in a university computer laboratory. Clinicians using the resource-based system could select 1 of 6 resources, such as PubMed; clinicians using the task-based system could select 1 of 6 clinical tasks, such as diagnosis. Clinicians in both systems could reformulate search queries. System logs unobtrusively capturing clinicians' interactions with the systems were coded and analyzed for clinicians' search actions and query reformulation strategies. The most frequent search action of clinicians using the resource-based system was to explore a new resource with the same query, that is, these clinicians exhibited a "breadth-first" search behaviour. Of 1398 search actions, clinicians using the resource-based system conducted 401 (28.7%, 95% confidence interval [CI] 26.37-31.11) in this way. In contrast, the majority of clinicians using the task-based system exhibited a "depth-first" search behavior in which they reformulated query keywords while keeping to the same task profiles. Of 585 search actions conducted by clinicians using the task-based system, 379 (64.8%, 95% CI 60.83-68.55) were conducted in this way. This study provides evidence that
UMass at TREC WEB 2014: Entity Query Feature Expansion using Knowledge Base Links

DTIC Science & Technology

2014-11-01

bears 270 sun tzu 274 golf instruction 291 sangre de cristo mountains 263 evidence for evolution 300 how to find the mean 262 balding cure 280 view my...internet history 294 flowering plants (b) Worst Query Title 264 tribe formerly living in alabama 295 how to tie a windsor knot 283 hayrides in pa 252...work we leverage the rich semantic knowledge available through these links to understand relevance of documents for a query. We fo- cus on the ad hoc
How College Students Search the Internet for Weight Control and Weight Management Information: An Observational Study

ERIC Educational Resources Information Center

Senkowski, Valerie; Branscum, Paul

2015-01-01

Background: Few studies have attempted to examine how young adults search for health information on the Internet, especially information related to weight control and weight management. Purpose: The purpose of this study was to determine search strategies that college students used for finding information related to weight control and weight…
Did online publishers "get it right"? Using a naturalistic search strategy to review cognitive health promotion content on internet webpages.

PubMed

Hunter, P V; Delbaere, M; O'Connell, M E; Cammer, A; Seaton, J X; Friedrich, T; Fick, F

2017-06-15

One of the most common uses of the Internet is to search for health-related information. Although scientific evidence pertaining to cognitive health promotion has expanded rapidly in recent years, it is unclear how much of this information has been made available to Internet users. Thus, the purpose of our study was to assess the reliability and quality of information about cognitive health promotion encountered by typical Internet users. To generate a list of relevant search terms employed by Internet users, we entered seed search terms in Google Trends and recorded any terms consistently used in the prior 2 years. To further approximate the behaviour of typical Internet users, we entered each term in Google and sampled the first two relevant results. This search, completed in October 2014, resulted in a sample of 86 webpages, 48 of which had content related to cognitive health promotion. An interdisciplinary team rated the information reliability and quality of these webpages using a standardized measure. We found that information reliability and quality were moderate, on average. Just one retrieved page mentioned best practice, national recommendations, or consensus guidelines by name. Commercial content (i.e., product promotion, advertising content, or non-commercial) was associated with differences in reliability and quality, with product promoter webpages having the lowest mean reliability and quality ratings. As efforts to communicate the association between lifestyle and cognitive health continue to expand, we offer these results as a baseline assessment of the reliability and quality of cognitive health promotion on the Internet.
An Improvement to a Multi-Client Searchable Encryption Scheme for Boolean Queries.

PubMed

Jiang, Han; Li, Xue; Xu, Qiuliang

2016-12-01

The migration of e-health systems to the cloud computing brings huge benefits, as same as some security risks. Searchable Encryption(SE) is a cryptography encryption scheme that can protect the confidentiality of data and utilize the encrypted data at the same time. The SE scheme proposed by Cash et al. in Crypto2013 and its follow-up work in CCS2013 are most practical SE Scheme that support Boolean queries at present. In their scheme, the data user has to generate the search tokens by the counter number one by one and interact with server repeatedly, until he meets the correct one, or goes through plenty of tokens to illustrate that there is no search result. In this paper, we make an improvement to their scheme. We allow server to send back some information and help the user to generate exact search token in the search phase. In our scheme, there are only two round interaction between server and user, and the search token has [Formula: see text] elements, where n is the keywords number in query expression, and [Formula: see text] is the minimum documents number that contains one of keyword in query expression, and the computation cost of server is [Formula: see text] modular exponentiation operation.
A Query Integrator and Manager for the Query Web

PubMed Central

Brinkley, James F.; Detwiler, Landon T.

2012-01-01

We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831
Searching for Truth: Internet Search Patterns as a Method of Investigating Online Responses to a Russian Illicit Drug Policy Debate

PubMed Central

Gillespie, James A; Quinn, Casey

2012-01-01

Background This is a methodological study investigating the online responses to a national debate over an important health and social problem in Russia. Russia is the largest Internet market in Europe, exceeding Germany in the absolute number of users. However, Russia is unusual in that the main search provider is not Google, but Yandex. Objective This study had two main objectives. First, to validate Yandex search patterns against those provided by Google, and second, to test this method's adequacy for investigating online interest in a 2010 national debate over Russian illicit drug policy. We hoped to learn what search patterns and specific search terms could reveal about the relative importance and geographic distribution of interest in this debate. Methods A national drug debate, centering on the anti-drug campaigner Egor Bychkov, was one of the main Russian domestic news events of 2010. Public interest in this episode was accompanied by increased Internet search. First, we measured the search patterns for 13 search terms related to the Bychkov episode and concurrent domestic events by extracting data from Google Insights for Search (GIFS) and Yandex WordStat (YaW). We conducted Spearman Rank Correlation of GIFS and YaW search data series. Second, we coded all 420 primary posts from Bychkov's personal blog between March 2010 and March 2012 to identify the main themes. Third, we compared GIFS and Yandex policies concerning the public release of search volume data. Finally, we established the relationship between salient drug issues and the Bychkov episode. Results We found a consistent pattern of strong to moderate positive correlations between Google and Yandex for the terms "Egor Bychkov" (r s = 0.88, P < .001), “Bychkov” (r s = .78, P < .001) and “Khimki”(r s = 0.92, P < .001). Peak search volumes for the Bychkov episode were comparable to other prominent domestic political events during 2010. Monthly search counts were 146,689 for “Bychkov” and
Information search in health care decision-making: a study of word-of-mouth and internet information users.

PubMed

Snipes, Robin L; Ingram, Rhea; Jiang, Pingjun

2005-01-01

This paper investigates how individual consumers may differ in their information search behavior in health care decision-making. Results indicate that most consumers still use word-of-mouth as a primary information source for health care decisions. However, usage of the Internet is increasing. The results of this study indicate that consumers who are most likely to use the Internet for health care information are single, younger, and less educated, whereas consumers who are most likely to use word-of-mouth are middle-aged, married, with higher income and higher education. Surprisingly, no significant gender difference was found in information search behavior for health care decision-making. The results also suggest that consumers with the highest tendency to use word-of-mouth are also the lowest users of the Internet in health care decision-making. Implications of these findings are discussed.
Investigating Intrinsic and Extrinsic Variables During Simulated Internet Search

NASA Technical Reports Server (NTRS)

Liechty, Molly M.; Madhavan, Poornima

2011-01-01

Using an eye tracker we examined decision-making processes during an internet search task. Twenty experienced homebuyers and twenty-five undergraduates from Old Dominion University viewed homes on a simulated real estate website. Several of the homes included physical properties that had the potential to negatively impact individual perceptions. These negative externalities were either easy to change (Level 1) or impossible to change (Level 2). Eye movements were analyzed to examine the relationship between participants' "stated preferences"[verbalized preferences], "revealed preferences" [actual decisions[, and experience. Dwell times, fixation durations/counts, and saccade counts/amplitudes were analyzed. Results revealed that experienced homebuyers demonstrated a more refined search pattern than novice searchers. Experienced homebuyers were also less impacted by negative externalities. Furthermore, stated preferences were discrepant from revealed preferences; although participants initially stated they liked/disliked a graphic, their eye movement patterns did not reflect this trend. These results have important implications for design of user-friendly web interfaces.

Tendency of cancer patients and their relatives to use internet for health-related searches: Turkish Oncology Group (TOG) Study.

PubMed

Nayir, Erdinc; Tanriverdi, Ozgur; Karakas, Yusuf; Kilickap, Saadettin; Serdar Turhal, Nazim; Avci, Nilufer; Okutur, Kerem; Koca, Dogan; Erdem, Dilek; Abali, Huseyin; Yamac, Deniz; Bilir, Cemil; Kacan, Turgut

2016-01-01

This study aimed to reveal the habits of using internet by cancer patients and their relatives to access health-related information and services in Turkey. An 18-item questionnaire survey was applied in cancer patients and their relatives. A total of 1106 patients (male, 37.3%, and female, 62.7%) and their relatives were included in the study. The responders had been using internet to obtain health information about oncological diseases, once a month (34.2%), 1-2 times a week (27.4%) or 2-3 times a month (21.9%). After diagnosis of cancer was made, participants more frequently (64.4%) investigated health-related issues, while 64.9% of them considered internet as an important search tool, and 16.7% of them had thought to give up cancer therapy under the influence of internet information. Some (33.1%) participants had used herbal medicine, and 16.7% of them had learnt these herbal products from internet. Still 12.7% of them had not questioned the accuracy of internet information, while 26.9% of them indicated that they had not shared the internet information about cancer with their physicians, and 13 % of them searched information in internet without asking their physicians. Cancer patients and their relatives showed a higher tendency to use health-related internet information which may mislead them, and can result in treatment incompliance. Health professionals should offer evidence-based information to the patients and their relatives through internet.
Advice from a Medical Expert through the Internet on Queries about AIDS and Hepatitis: Analysis of a Pilot Experiment

PubMed Central

Marco, Javier; Barba, Raquel; Losa, Juan E; de la Serna, Carlos Martínez; Sainz, María; Lantigua, Isabel Fernández; de la Serna, Jose Luis

2006-01-01

Background Advice from a medical expert on concerns and queries expressed anonymously through the Internet by patients and later posted on the Web, offers a new type of patient–doctor relationship. The aim of the current study was to perform a descriptive analysis of questions about AIDS and hepatitis made to an infectious disease expert and sent through the Internet to a consumer-oriented Web site in the Spanish language. Methods and Findings Questions were e-mailed and the questions and answers were posted anonymously in the “expert-advice” section of a Web site focused on AIDS and hepatitis. We performed a descriptive study and a temporal analysis of the questions received in the first 12 months after the launch of the site. A total of 899 questions were received from December 2003 to November 2004, with a marked linear growth pattern. Questions originated in Spain in 68% of cases and 32% came from Latin America (the Caribbean, Central America, and South America). Eighty percent of the senders were male. Most of the questions concerned HIV infection (79%) with many fewer on hepatitis (17%) . The highest numbers of questions were submitted just after the weekend (37% of questions were made on Mondays and Tuesdays). Risk factors for contracting HIV infection were the most frequent concern (69%), followed by the window period for detection (12.6%), laboratory results (5.9%), symptoms (4.7%), diagnosis (2.7%), and treatment (2.2%). Conclusions Our results confirm a great demand for this type of “ask-the-expert” Internet service, at least for AIDS and hepatitis. Factors such as anonymity, free access, and immediate answers have been key factors in its success. PMID:16796404
Seniors, health information, and the Internet: motivation, ability, and Internet knowledge.

PubMed

Sheng, Xiaojing; Simpson, Penny M

2013-10-01

Providing health information to older adults is crucial to empowering them to better control their health, and the information is readily available on the Internet. Yet, little is known about the factors that are important in affecting seniors' Internet search for health information behavior. This work addresses this research deficit by examining the role of health information orientation (HIO), eHealth literacy, and Internet knowledge (IK) in affecting the likelihood of using the Internet as a source for health information. The analysis reveals that each variable in the study is significant in affecting Internet search likelihood. Results from the analysis also demonstrate the partial mediating role of eHealth literacy and the interaction between eHealth literacy and HIO. The findings suggest that improving seniors' IK and eHealth literacy would increase their likelihood of searching for and finding health information on the Internet that might encourage better health behaviors.
Context-Aware Online Commercial Intention Detection

NASA Astrophysics Data System (ADS)

Hu, Derek Hao; Shen, Dou; Sun, Jian-Tao; Yang, Qiang; Chen, Zheng

With more and more commercial activities moving onto the Internet, people tend to purchase what they need through Internet or conduct some online research before the actual transactions happen. For many Web users, their online commercial activities start from submitting a search query to search engines. Just like the common Web search queries, the queries with commercial intention are usually very short. Recognizing the queries with commercial intention against the common queries will help search engines provide proper search results and advertisements, help Web users obtain the right information they desire and help the advertisers benefit from the potential transactions. However, the intentions behind a query vary a lot for users with different background and interest. The intentions can even be different for the same user, when the query is issued in different contexts. In this paper, we present a new algorithm framework based on skip-chain conditional random field (SCCRF) for automatically classifying Web queries according to context-based online commercial intention. We analyze our algorithm performance both theoretically and empirically. Extensive experiments on several real search engine log datasets show that our algorithm can improve more than 10% on F1 score than previous algorithms on commercial intention detection.
Collusion-aware privacy-preserving range query in tiered wireless sensor networks.

PubMed

Zhang, Xiaoying; Dong, Lei; Peng, Hui; Chen, Hong; Zhao, Suyun; Li, Cuiping

2014-12-11

Wireless sensor networks (WSNs) are indispensable building blocks for the Internet of Things (IoT). With the development of WSNs, privacy issues have drawn more attention. Existing work on the privacy-preserving range query mainly focuses on privacy preservation and integrity verification in two-tiered WSNs in the case of compromisedmaster nodes, but neglects the damage of node collusion. In this paper, we propose a series of collusion-aware privacy-preserving range query protocols in two-tiered WSNs. To the best of our knowledge, this paper is the first to consider collusion attacks for a range query in tiered WSNs while fulfilling the preservation of privacy and integrity. To preserve the privacy of data and queries, we propose a novel encoding scheme to conceal sensitive information. To preserve the integrity of the results, we present a verification scheme using the correlation among data. In addition, two schemes are further presented to improve result accuracy and reduce communication cost. Finally, theoretical analysis and experimental results confirm the efficiency, accuracy and privacy of our proposals.
Collusion-Aware Privacy-Preserving Range Query in Tiered Wireless Sensor Networks†

PubMed Central

Zhang, Xiaoying; Dong, Lei; Peng, Hui; Chen, Hong; Zhao, Suyun; Li, Cuiping

2014-01-01

Wireless sensor networks (WSNs) are indispensable building blocks for the Internet of Things (IoT). With the development of WSNs, privacy issues have drawn more attention. Existing work on the privacy-preserving range query mainly focuses on privacy preservation and integrity verification in two-tiered WSNs in the case of compromised master nodes, but neglects the damage of node collusion. In this paper, we propose a series of collusion-aware privacy-preserving range query protocols in two-tiered WSNs. To the best of our knowledge, this paper is the first to consider collusion attacks for a range query in tiered WSNs while fulfilling the preservation of privacy and integrity. To preserve the privacy of data and queries, we propose a novel encoding scheme to conceal sensitive information. To preserve the integrity of the results, we present a verification scheme using the correlation among data. In addition, two schemes are further presented to improve result accuracy and reduce communication cost. Finally, theoretical analysis and experimental results confirm the efficiency, accuracy and privacy of our proposals. PMID:25615731
Improving Concept-Based Web Image Retrieval by Mixing Semantically Similar Greek Queries

ERIC Educational Resources Information Center

Lazarinis, Fotis

2008-01-01

Purpose: Image searching is a common activity for web users. Search engines offer image retrieval services based on textual queries. Previous studies have shown that web searching is more demanding when the search is not in English and does not use a Latin-based language. The aim of this paper is to explore the behaviour of the major search…
Economic Recession and Obesity-Related Internet Search Behavior in Taiwan: Analysis of Google Trends Data

PubMed Central

2018-01-01

Background Obesity is highly correlated with the development of chronic diseases and has become a critical public health issue that must be countered by aggressive action. This study determined whether data from Google Trends could provide insight into trends in obesity-related search behaviors in Taiwan. Objective Using Google Trends, we examined how changes in economic conditions—using business cycle indicators as a proxy—were associated with people’s internet search behaviors related to obesity awareness, health behaviors, and fast food restaurants. Methods Monthly business cycle indicators were obtained from the Taiwan National Development Council. Weekly Taiwan Stock Exchange (TWSE) weighted index data were accessed and downloaded from Yahoo Finance. The weekly relative search volumes (RSV) of obesity-related terms were downloaded from Google Trends. RSVs of obesity-related terms and the TWSE from January 2007 to December 2011 (60 months) were analyzed using correlation analysis. Results During an economic recession, the RSV of obesity awareness and health behaviors declined (r=.441, P<.001; r=.593, P<.001, respectively); however, the RSV for fast food restaurants increased (r=−.437, P<.001). Findings indicated that when the economy was faltering, people tended to be less likely to search for information related to health behaviors and obesity awareness; moreover, they were more likely to search for fast food restaurants. Conclusions Macroeconomic conditions can have an impact on people’s health-related internet searches. PMID:29625958
Economic Recession and Obesity-Related Internet Search Behavior in Taiwan: Analysis of Google Trends Data.

PubMed

Wang, Ho-Wei; Chen, Duan-Rung

2018-04-06

Obesity is highly correlated with the development of chronic diseases and has become a critical public health issue that must be countered by aggressive action. This study determined whether data from Google Trends could provide insight into trends in obesity-related search behaviors in Taiwan. Using Google Trends, we examined how changes in economic conditions-using business cycle indicators as a proxy-were associated with people's internet search behaviors related to obesity awareness, health behaviors, and fast food restaurants. Monthly business cycle indicators were obtained from the Taiwan National Development Council. Weekly Taiwan Stock Exchange (TWSE) weighted index data were accessed and downloaded from Yahoo Finance. The weekly relative search volumes (RSV) of obesity-related terms were downloaded from Google Trends. RSVs of obesity-related terms and the TWSE from January 2007 to December 2011 (60 months) were analyzed using correlation analysis. During an economic recession, the RSV of obesity awareness and health behaviors declined (r=.441, P<.001; r=.593, P<.001, respectively); however, the RSV for fast food restaurants increased (r=-.437, P<.001). Findings indicated that when the economy was faltering, people tended to be less likely to search for information related to health behaviors and obesity awareness; moreover, they were more likely to search for fast food restaurants. Macroeconomic conditions can have an impact on people's health-related internet searches. ©Ho-Wei Wang, Duan-Rung Chen. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 06.04.2018.
Meta Search Engines.

ERIC Educational Resources Information Center

Garman, Nancy

1999-01-01

Describes common options and features to consider in evaluating which meta search engine will best meet a searcher's needs. Discusses number and names of engines searched; other sources and specialty engines; search queries; other search options; and results options. (AEF)
Query-Biased Preview over Outsourced and Encrypted Data

PubMed Central

Luo, Guangchun; Qin, Ke; Chen, Aiguo

2013-01-01

For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length. PMID:24078798
Query-biased preview over outsourced and encrypted data.

PubMed

Peng, Ningduo; Luo, Guangchun; Qin, Ke; Chen, Aiguo

2013-01-01

For both convenience and security, more and more users encrypt their sensitive data before outsourcing it to a third party such as cloud storage service. However, searching for the desired documents becomes problematic since it is costly to download and decrypt each possibly needed document to check if it contains the desired content. An informative query-biased preview feature, as applied in modern search engine, could help the users to learn about the content without downloading the entire document. However, when the data are encrypted, securely extracting a keyword-in-context snippet from the data as a preview becomes a challenge. Based on private information retrieval protocol and the core concept of searchable encryption, we propose a single-server and two-round solution to securely obtain a query-biased snippet over the encrypted data from the server. We achieve this novel result by making a document (plaintext) previewable under any cryptosystem and constructing a secure index to support dynamic computation for a best matched snippet when queried by some keywords. For each document, the scheme has O(d) storage complexity and O(log(d/s) + s + d/s) communication complexity, where d is the document size and s is the snippet length.
STARS 2.0: 2nd-generation open-source archiving and query software

NASA Astrophysics Data System (ADS)

Winegar, Tom

2008-07-01

The Subaru Telescope is in process of developing an open-source alternative to the 1st-generation software and databases (STARS 1) used for archiving and query. For STARS 2, we have chosen PHP and Python for scripting and MySQL as the database software. We have collected feedback from staff and observers, and used this feedback to significantly improve the design and functionality of our future archiving and query software. Archiving - We identified two weaknesses in 1st-generation STARS archiving software: a complex and inflexible table structure and uncoordinated system administration for our business model: taking pictures from the summit and archiving them in both Hawaii and Japan. We adopted a simplified and normalized table structure with passive keyword collection, and we are designing an archive-to-archive file transfer system that automatically reports real-time status and error conditions and permits error recovery. Query - We identified several weaknesses in 1st-generation STARS query software: inflexible query tools, poor sharing of calibration data, and no automatic file transfer mechanisms to observers. We are developing improved query tools and sharing of calibration data, and multi-protocol unassisted file transfer mechanisms for observers. In the process, we have redefined a 'query': from an invisible search result that can only transfer once in-house right now, with little status and error reporting and no error recovery - to a stored search result that can be monitored, transferred to different locations with multiple protocols, reporting status and error conditions and permitting recovery from errors.
Visual perception-based criminal identification: a query-based approach

NASA Astrophysics Data System (ADS)

Singh, Avinash Kumar; Nandi, G. C.

2017-01-01

The visual perception of eyewitness plays a vital role in criminal identification scenario. It helps law enforcement authorities in searching particular criminal from their previous record. It has been reported that searching a criminal record manually requires too much time to get the accurate result. We have proposed a query-based approach which minimises the computational cost along with the reduction of search space. A symbolic database has been created to perform a stringent analysis on 150 public (Bollywood celebrities and Indian cricketers) and 90 local faces (our data-set). An expert knowledge has been captured to encapsulate every criminal's anatomical and facial attributes in the form of symbolic representation. A fast query-based searching strategy has been implemented using dynamic decision tree data structure which allows four levels of decomposition to fetch respective criminal records. Two types of case studies - viewed and forensic sketches have been considered to evaluate the strength of our proposed approach. We have derived 1200 views of the entire population by taking into consideration 80 participants as eyewitness. The system demonstrates an accuracy level of 98.6% for test case I and 97.8% for test case II. It has also been reported that experimental results reduce the search space up to 30 most relevant records.
A New Publicly Available Chemical Query Language, CSRML ...

EPA Pesticide Factsheets

A new XML-based query language, CSRML, has been developed for representing chemical substructures, molecules, reaction rules, and reactions. CSRML queries are capable of integrating additional forms of information beyond the simple substructure (e.g., SMARTS) or reaction transformation (e.g., SMIRKS, reaction SMILES) queries currently in use. Chemotypes, a term used to represent advanced CSRML queries for repeated application can be encoded not only with connectivity and topology, but also with properties of atoms, bonds, electronic systems, or molecules. The CSRML language has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory and commercial use chemical space, as well as to represent features and frameworks believed to be especially relevant to toxicity concerns. A software application, ChemoTyper, has also been developed and made publicly available to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML standard used in CSRML-based chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge. Paper details specifications for a new XML-based query lan
Conceptual mapping of user's queries to medical subject headings.

PubMed Central

Zieman, Y. L.; Bleich, H. L.

1997-01-01

This paper describes a way to map users' queries to relevant Medical Subject Headings (MeSH terms) used by the National Library of Medicine to index the biomedical literature. The method, called SENSE (SEarch with New SEmantics), transforms words and phrases in the users' queries into primary conceptual components and compares these components with those of the MeSH vocabulary. Similar to the way in which most numbers can be split into numerical factors and expressed as their product--for example, 42 can be expressed as 2*21, 6*7, 3*14, 2*3*7,--so most medical concepts can be split into "semantic factors" and expressed as their juxtaposition. Note that if we split 42 into its primary factors, the breakdown is unique: 2*3*7. Similarly, when we split medical concepts into their "primary semantic factors" the breakdown is also unique. For example, the MeSH term 'renovascular hypertension' can be split morphologically into reno, vascular, hyper, and tension--morphemes that can then be translated into their primary semantic factors--kidney, blood vessel, high, and pressure. By "factoring" each MeSH term in this way, and by similarly factoring the user's query, we can match query to MeSH term by searching for combinations of common factors. Unlike UMLS and other methods that match at the level of words or phrases, SENSE matches at the level of concepts; in this way, a wide variety of words and phrases that have the same meaning produce the same match. Now used in PaperChase, the method is surprisingly powerful in matching users' queries to Medical Subject Headings. PMID:9357680
An index-based algorithm for fast on-line query processing of latent semantic analysis.

PubMed

Zhang, Mingxi; Li, Pohan; Wang, Wei

2017-01-01

Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.
Web-Searching to Learn: The Role of Internet Self-Efficacy in Pre-School Educators' Conceptions and Approaches

ERIC Educational Resources Information Center

Kao, Chia-Pin; Chien, Hui-Min

2017-01-01

This study was conducted to explore the relationships between pre-school educators' conceptions of and approaches to learning by web-searching through Internet Self-efficacy. Based on data from 242 pre-school educators who had prior experience of participating in web-searching in Taiwan for path analyses, it was found in this study that…
Opinions of Teachers on Using Internet Searching Strategies: An Elementary School Case in Turkey

ERIC Educational Resources Information Center

Kabakci, Isil; Firat, Mehmet; Izmirli, Serkan; Kuzu, Elif Bugra

2010-01-01

The purpose of the current study is to determine opinions of teachers on using internet searching strategies in an elementary school. The study conducted through qualitative method was designed on survey research model. Participants were consisted of 21 teachers at an elementary school in Eskisehir in Turkey. Questionnaires consisting of…
Language Preferences on Websites and in Google Searches for Human Health and Food Information

PubMed Central

Singh, Punam Mony; Wight, Carly A; Sercinoglu, Olcan; Wilson, David C; Boytsov, Artem

2007-01-01

Background While it is known that the majority of pages on the World Wide Web are in English, little is known about the preferred language of users searching for health information online. Objectives (1) To help global and domestic publishers, for example health and food agencies, to determine the need for translation of online information from English into local languages. (2) To help these agencies determine which language(s) they should select when publishing information online in target nations and for target subpopulations within nations. Methods To estimate the percentage of Web publishers that translate their health and food websites, we measured the frequency at which domain names retrieved by Google overlap for language translations of the same health-related search term. To quantify language choice of searchers from different countries, Google provided estimates of the rate at which its search engine was queried in six languages relative to English for the terms “avian flu,” “tuberculosis,” “schizophrenia,” and “maize” (corn) from January 2004 to April 2006. The estimate was based on a 20% sample of all Google queries from 227 nations. Results We estimate that 80%-90% of health- and food-related institutions do not translate their websites into multiple languages, even when the information concerns pandemic disease such as avian influenza. Although Internet users are often well-educated, there was a strong preference for searching for health and food information in the local language, rather than English. For “avian flu,” we found that only 1% of searches in non-English-speaking nations were in English, whereas for “tuberculosis” or “schizophrenia,” about 4%-40% of searches in non-English countries employed English. A subset of searches for health information presumably originating from immigrants occurred in their native tongue, not the language of the adopted country. However, Spanish-language online searches for “avian flu

Using the Baidu Search Index to Predict the Incidence of HIV/AIDS in China.

PubMed

He, Guangye; Chen, Yunsong; Chen, Buwei; Wang, Hao; Shen, Li; Liu, Liu; Suolang, Deji; Zhang, Boyang; Ju, Guodong; Zhang, Liangliang; Du, Sijia; Jiang, Xiangxue; Pan, Yu; Min, Zuntao

2018-06-13

Based on a panel of 30 provinces and a timeframe from January 2009 to December 2013, we estimate the association between monthly human immunodeficiency virus/acquired immune deficiency syndrome (HIV/AIDS) incidence and the relevant Internet search query volumes in Baidu, the most widely used search engine among the Chinese. The pooled mean group (PMG) model show that the Baidu search index (BSI) positively predicts the increase in HIV/AIDS incidence, with a 1% increase in BSI associated with a 2.1% increase in HIV/AIDS incidence on average. This study proposes a promising method to estimate and forecast the incidence of HIV/AIDS, a type of infectious disease that is culturally sensitive and highly unevenly distributed in China; the method can be taken as a complement to a traditional HIV/AIDS surveillance system.
Patterns of Information-Seeking for Cancer on the Internet: An Analysis of Real World Data

PubMed Central

Ofran, Yishai; Paltiel, Ora; Pelleg, Dan; Rowe, Jacob M.; Yom-Tov, Elad

2012-01-01

Although traditionally the primary information sources for cancer patients have been the treating medical team, patients and their relatives increasingly turn to the Internet, though this source may be misleading and confusing. We assess Internet searching patterns to understand the information needs of cancer patients and their acquaintances, as well as to discern their underlying psychological states. We screened 232,681 anonymous users who initiated cancer-specific queries on the Yahoo Web search engine over three months, and selected for study users with high levels of interest in this topic. Searches were partitioned by expected survival for the disease being searched. We compared the search patterns of anonymous users and their contacts. Users seeking information on aggressive malignancies exhibited shorter search periods, focusing on disease- and treatment-related information. Users seeking knowledge regarding more indolent tumors searched for longer periods, alternated between different subjects, and demonstrated a high interest in topics such as support groups. Acquaintances searched for longer periods than the proband user when seeking information on aggressive (compared to indolent) cancers. Information needs can be modeled as transitioning between five discrete states, each with a unique signature representing the type of information of interest to the user. Thus, early phases of information-seeking for cancer follow a specific dynamic pattern. Areas of interest are disease dependent and vary between probands and their contacts. These patterns can be used by physicians and medical Web site authors to tailor information to the needs of patients and family members. PMID:23029317
Annual variation in Internet keyword searches: Linking dieting interest to obesity and negative health outcomes.

PubMed

Markey, Patrick M; Markey, Charlotte N

2013-07-01

This study investigated the annual variation in Internet searches regarding dieting. Time-series analysis was first used to examine the annual trends of Google keyword searches during the past 7 years for topics related to dieting within the United States. The results indicated that keyword searches for dieting fit a consistent 12-month linear model, peaking in January (following New Year's Eve) and then linearly decreasing until surging again the following January. Additional state-level analyses revealed that the size of the December-January dieting-related keyword surge was predictive of both obesity and mortality rates due to diabetes, heart disease, and stroke.
System, method and apparatus for conducting a keyterm search

NASA Technical Reports Server (NTRS)

McGreevy, Michael W. (Inventor)

2004-01-01

A keyterm search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more keyterms. Next, a gleaning model of the query is created. The gleaning model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.
Internet use among Turkish adolescents.

PubMed

Tahiroglu, Aysegul Yolga; Celik, Gonca G; Uzel, Mehtap; Ozcan, Neslihan; Avci, Ayse

2008-10-01

The aim of this study was to investigate Internet use habits and problematic Internet use (PIU) in Turkish adolescents. Participants were 3,975 undergraduate students, 7.6% of whom used the Internet for more than 12 hours weekly. The Online Cognition Scale (OCS) was used. The most common purpose for using the Internet was playing games, followed by general information search. Female users mostly preferred searching for general information; male users preferred playing games (p < 0.001, gamma = 995.205). The most preferred type of game was violent games. While preference for strategy and fantasy role-play (FRP) games increased with age, preference for other games decreased (p < 0.0001, gamma = 283.767). Participants who used the Internet mostly for general information searches and school-related searches had lower OCS scores (p < 0.0001). The highest OCS scores were related to violent games, followed by FRP, strategy, and sports and motor racing games. Computers and the Internet are useful, important inventions, but like other inventions, if used improperly, they may be harmful. Risk of harm raises concerns about who should use the Internet and computers, and where, when, and why the Internet and computers should be used.
Online Information Search Performance and Search Strategies in a Health Problem-Solving Scenario.

PubMed

Sharit, Joseph; Taha, Jessica; Berkowsky, Ronald W; Profita, Halley; Czaja, Sara J

2015-01-01

Although access to Internet health information can be beneficial, solving complex health-related problems online is challenging for many individuals. In this study, we investigated the performance of a sample of 60 adults ages 18 to 85 years in using the Internet to resolve a relatively complex health information problem. The impact of age, Internet experience, and cognitive abilities on measures of search time, amount of search, and search accuracy was examined, and a model of Internet information seeking was developed to guide the characterization of participants' search strategies. Internet experience was found to have no impact on performance measures. Older participants exhibited longer search times and lower amounts of search but similar search accuracy performance as their younger counterparts. Overall, greater search accuracy was related to an increased amount of search but not to increased search duration and was primarily attributable to higher cognitive abilities, such as processing speed, reasoning ability, and executive function. There was a tendency for those who were younger, had greater Internet experience, and had higher cognitive abilities to use a bottom-up (i.e., analytic) search strategy, although use of a top-down (i.e., browsing) strategy was not necessarily unsuccessful. Implications of the findings for future studies and design interventions are discussed.
Online Information Search Performance and Search Strategies in a Health Problem-Solving Scenario

PubMed Central

Sharit, Joseph; Taha, Jessica; Berkowsky, Ronald W.; Profita, Halley; Czaja, Sara J.

2017-01-01

Although access to Internet health information can be beneficial, solving complex health-related problems online is challenging for many individuals. In this study, we investigated the performance of a sample of 60 adults ages 18 to 85 years in using the Internet to resolve a relatively complex health information problem. The impact of age, Internet experience, and cognitive abilities on measures of search time, amount of search, and search accuracy was examined, and a model of Internet information seeking was developed to guide the characterization of participants’ search strategies. Internet experience was found to have no impact on performance measures. Older participants exhibited longer search times and lower amounts of search but similar search accuracy performance as their younger counterparts. Overall, greater search accuracy was related to an increased amount of search but not to increased search duration and was primarily attributable to higher cognitive abilities, such as processing speed, reasoning ability, and executive function. There was a tendency for those who were younger, had greater Internet experience, and had higher cognitive abilities to use a bottom-up (i.e., analytic) search strategy, although use of a top-down (i.e., browsing) strategy was not necessarily unsuccessful. Implications of the findings for future studies and design interventions are discussed. PMID:29056885
Drexel at TREC 2014 Federated Web Search Track

DTIC Science & Technology

2014-11-01

of its input RS results. 1. INTRODUCTION Federated Web Search is the task of searching multiple search engines simultaneously and combining their...or distributed properly[5]. The goal of RS is then, for a given query, to select only the most promising search engines from all those available. Most...result pages of 149 search engines . 4000 queries are used in building the sample set. As a part of the Vertical Selection task, search engines are
Web queries as a source for syndromic surveillance.

PubMed

Hulth, Anette; Rydevik, Gustaf; Linde, Annika

2009-01-01

In the field of syndromic surveillance, various sources are exploited for outbreak detection, monitoring and prediction. This paper describes a study on queries submitted to a medical web site, with influenza as a case study. The hypothesis of the work was that queries on influenza and influenza-like illness would provide a basis for the estimation of the timing of the peak and the intensity of the yearly influenza outbreaks that would be as good as the existing laboratory and sentinel surveillance. We calculated the occurrence of various queries related to influenza from search logs submitted to a Swedish medical web site for two influenza seasons. These figures were subsequently used to generate two models, one to estimate the number of laboratory verified influenza cases and one to estimate the proportion of patients with influenza-like illness reported by selected General Practitioners in Sweden. We applied an approach designed for highly correlated data, partial least squares regression. In our work, we found that certain web queries on influenza follow the same pattern as that obtained by the two other surveillance systems for influenza epidemics, and that they have equal power for the estimation of the influenza burden in society. Web queries give a unique access to ill individuals who are not (yet) seeking care. This paper shows the potential of web queries as an accurate, cheap and labour extensive source for syndromic surveillance.
Internet Searches for Affect-Related Terms: An Indicator of Subjective Well-Being and Predictor of Health Outcomes across US States and Metro Areas.

PubMed

Ford, Michael T; Jebb, Andrew T; Tay, Louis; Diener, Ed

2018-03-01

The present study explored the potential for internet search data to serve as indicators of subjective well-being (SWB) and predictors of health at the state and metro area levels. We propose that searches for positive and negative affect-related terms represent information-seeking behavior of individuals who are experiencing emotions and seeking information about them. Data on the frequency of Google searches for 15 affect terms were collected from Google's Trends website (trends.google.com). These were paired with data on health, self-reported emotions, psychological well-being, personality, and Twitter postings at the state and metro area levels. Several internet search scores correlated with indicators of cardiovascular health and depression. Some search term scores also correlated strongly with self-reported emotions, well-being metrics, neuroticism, per capita income, and Twitter postings at the state or metro area level. Multiple regression analyses suggest that affect searches predict depression rates at the metro area level beyond the effects of income and other well-being measures. The results highlight the promise and challenges of using internet search data at the aggregate level for physical and mental health assessment and surveillance. © 2018 The International Association of Applied Psychology.
Information-seeking behaviour for epilepsy: an infodemiological study of searches for Wikipedia articles.

PubMed

Brigo, Francesco; Otte, Willem M; Igwe, Stanley C; Ausserer, Harald; Nardone, Raffaele; Tezzon, Frediano; Trinka, Eugen

2015-12-01

Millions of people worldwide use the internet daily as a source of health information. Wikipedia is a popular free online encyclopaedia used by patients and physicians to search for health-related information. Our aim was to evaluate information-seeking behaviour of English-speaking internet users searching Wikipedia for articles related to epilepsy and epileptic seizures. Using Wiki Trends, which provides quantitative information on daily viewing of articles, data on global search queries for Wikipedia articles related to epilepsy and seizures were analysed. The daily Wikipedia article views on syncope, psychogenic non-epileptic seizures, migraine, and multiple sclerosis served as comparative data. The period of analysis covered was from January 2008 to December 2014. Overall, the Wikipedia article "epilepsy and driving" was found to be more frequently visited than the articles "epilepsy and employment" or "epilepsy in children". Since January 2008, the Wikipedia article "multiple sclerosis" was more often visited compared to the articles "epilepsy", "syncope", "psychogenic non-epileptic seizures" or "migraine"; the article "epilepsy" ranked 3,779 and was less frequently visited than "multiple sclerosis", ranked at 571, in traffic on Wikipedia. The highest peak in search volume for the article "epilepsy" coincided with the news of a celebrity having seizures. Fears and worries about epileptic seizures, their impact on driving and employment, and news about celebrities with epilepsy might be major determinants in searching Wikipedia for information.
Performance of Point and Range Queries for In-memory Databases using Radix Trees on GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alam, Maksudul; Yoginath, Srikanth B; Perumalla, Kalyan S

In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based adaptive radix tree (GRT) implementation over a variety of key distributions, synthetic benchmarks, and actual keys from music and book data sets. The performance is also compared with other index-searching schemes on the GPU. GRT on modern GPUs achieves some of themore » highest rates of index searches reported in the literature. For point queries, a throughput of up to 106 million and 130 million lookups per second is achieved for sparse and dense keys, respectively. For range queries, GRT yields 600 million and 1000 million lookups per second for sparse and dense keys, respectively, on a large dataset of 64 million 32-bit keys.« less
An index-based algorithm for fast on-line query processing of latent semantic analysis

PubMed Central

Li, Pohan; Wang, Wei

2017-01-01

Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747
Utility and potential of rapid epidemic intelligence from internet-based sources.

PubMed

Yan, S J; Chughtai, A A; Macintyre, C R

2017-10-01

Rapid epidemic detection is an important objective of surveillance to enable timely intervention, but traditional validated surveillance data may not be available in the required timeframe for acute epidemic control. Increasing volumes of data on the Internet have prompted interest in methods that could use unstructured sources to enhance traditional disease surveillance and gain rapid epidemic intelligence. We aimed to summarise Internet-based methods that use freely-accessible, unstructured data for epidemic surveillance and explore their timeliness and accuracy outcomes. Steps outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist were used to guide a systematic review of research related to the use of informal or unstructured data by Internet-based intelligence methods for surveillance. We identified 84 articles published between 2006-2016 relating to Internet-based public health surveillance methods. Studies used search queries, social media posts and approaches derived from existing Internet-based systems for early epidemic alerts and real-time monitoring. Most studies noted improved timeliness compared to official reporting, such as in the 2014 Ebola epidemic where epidemic alerts were generated first from ProMED-mail. Internet-based methods showed variable correlation strength with official datasets, with some methods showing reasonable accuracy. The proliferation of publicly available information on the Internet provided a new avenue for epidemic intelligence. Methodologies have been developed to collect Internet data and some systems are already used to enhance the timeliness of traditional surveillance systems. To improve the utility of Internet-based systems, the key attributes of timeliness and data accuracy should be included in future evaluations of surveillance systems. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining

PubMed Central

Sadesh, S.; Suganthe, R. C.

2015-01-01

Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio. PMID:26221626
Advances in using Internet searches to track dengue

PubMed Central

Yang, Shihao; Kou, Samuel C.; Brownstein, John S.; Brooke, Nicholas

2017-01-01

Dengue is a mosquito-borne disease that threatens over half of the world’s population. Despite being endemic to more than 100 countries, government-led efforts and tools for timely identification and tracking of new infections are still lacking in many affected areas. Multiple methodologies that leverage the use of Internet-based data sources have been proposed as a way to complement dengue surveillance efforts. Among these, dengue-related Google search trends have been shown to correlate with dengue activity. We extend a methodological framework, initially proposed and validated for flu surveillance, to produce near real-time estimates of dengue cases in five countries/states: Mexico, Brazil, Thailand, Singapore and Taiwan. Our result shows that our modeling framework can be used to improve the tracking of dengue activity in multiple locations around the world. PMID:28727821
muBLASTP: database-indexed protein sequence search on multicore CPUs.

PubMed

Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun

2016-11-04

The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.
System, method and apparatus for conducting a phrase search

NASA Technical Reports Server (NTRS)

McGreevy, Michael W. (Inventor)

2004-01-01

A phrase search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more sequences of terms. Next, a relational model of the query is created. The relational model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.
World Wide Web Metaphors for Search Mission Data

NASA Technical Reports Server (NTRS)

Norris, Jeffrey S.; Wallick, Michael N.; Joswig, Joseph C.; Powell, Mark W.; Torres, Recaredo J.; Mittman, David S.; Abramyan, Lucy; Crockett, Thomas M.; Shams, Khawaja S.; Fox, Jason M.;

2010-01-01

A software program that searches and browses mission data emulates a Web browser, containing standard meta - phors for Web browsing. By taking advantage of back-end URLs, users may save and share search states. Also, since a Web interface is familiar to users, training time is reduced. Familiar back and forward buttons move through a local search history. A refresh/reload button regenerates a query, and loads in any new data. URLs can be constructed to save search results. Adding context to the current search is also handled through a familiar Web metaphor. The query is constructed by clicking on hyperlinks that represent new components to the search query. The selection of a link appears to the user as a page change; the choice of links changes to represent the updated search and the results are filtered by the new criteria. Selecting a navigation link changes the current query and also the URL that is associated with it. The back button can be used to return to the previous search state. This software is part of the MSLICE release, which was written in Java. It will run on any current Windows, Macintosh, or Linux system.

Folksonomical P2P File Sharing Networks Using Vectorized KANSEI Information as Search Tags

NASA Astrophysics Data System (ADS)

Ohnishi, Kei; Yoshida, Kaori; Oie, Yuji

We present the concept of folksonomical peer-to-peer (P2P) file sharing networks that allow participants (peers) to freely assign structured search tags to files. These networks are similar to folksonomies in the present Web from the point of view that users assign search tags to information distributed over a network. As a concrete example, we consider an unstructured P2P network using vectorized Kansei (human sensitivity) information as structured search tags for file search. Vectorized Kansei information as search tags indicates what participants feel about their files and is assigned by the participant to each of their files. A search query also has the same form of search tags and indicates what participants want to feel about files that they will eventually obtain. A method that enables file search using vectorized Kansei information is the Kansei query-forwarding method, which probabilistically propagates a search query to peers that are likely to hold more files having search tags that are similar to the query. The similarity between the search query and the search tags is measured in terms of their dot product. The simulation experiments examine if the Kansei query-forwarding method can provide equal search performance for all peers in a network in which only the Kansei information and the tendency with respect to file collection are different among all of the peers. The simulation results show that the Kansei query forwarding method and a random-walk-based query forwarding method, for comparison, work effectively in different situations and are complementary. Furthermore, the Kansei query forwarding method is shown, through simulations, to be superior to or equal to the random-walk based one in terms of search speed.

PIRIA: a general tool for indexing, search, and retrieval of multimedia content

NASA Astrophysics Data System (ADS)

Joint, Magali; Moellic, Pierre-Alain; Hede, P.; Adam, P.

2004-05-01

The Internet is a continuously expanding source of multimedia content and information. There are many products in development to search, retrieve, and understand multimedia content. But most of the current image search/retrieval engines, rely on a image database manually pre-indexed with keywords. Computers are still powerless to understand the semantic meaning of still or animated image content. Piria (Program for the Indexing and Research of Images by Affinity), the search engine we have developed brings this possibility closer to reality. Piria is a novel search engine that uses the query by example method. A user query is submitted to the system, which then returns a list of images ranked by similarity, obtained by a metric distance that operates on every indexed image signature. These indexed images are compared according to several different classifiers, not only Keywords, but also Form, Color and Texture, taking into account geometric transformations and variance like rotation, symmetry, mirroring, etc. Form - Edges extracted by an efficient segmentation algorithm. Color - Histogram, semantic color segmentation and spatial color relationship. Texture - Texture wavelets and local edge patterns. If required, Piria is also able to fuse results from multiple classifiers with a new classification of index categories: Single Indexer Single Call (SISC), Single Indexer Multiple Call (SIMC), Multiple Indexers Single Call (MISC) or Multiple Indexers Multiple Call (MIMC). Commercial and industrial applications will be explored and discussed as well as current and future development.
Clean Air Markets - Facility Attributes and Contacts Query Wizard

EPA Pesticide Factsheets

The Facility Attributes and Contacts Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://camddataandmaps.epa.gov/gdm/index.cfm. The Facility Attributes and Contact module gives the user access to current and historical facility, owner, and representative data using custom queries, via the Facility Attributes Query Wizard, or Quick Reports. In addition, data regarding EPA, State, and local agency staff are also available. The Query Wizard can be used to search for data about a facility or facilities by identifying characteristics such as associated programs, owners, representatives, locations, and unit characteristics, facility inventories, and classifications.EPA's Clean Air Markets Division (CAMD) includes several market-based regulatory programs designed to improve air quality and ecosystems. The most well-known of these programs are EPA's Acid Rain Program and the NOx Programs, which reduce emissions of sulfur dioxide (SO2) and nitrogen oxides (NOx)-compounds that adversely affect air quality, the environment, and public health. CAMD also plays an integral role in the development and implementation of the Clean Air Interstate Rule (CAIR).
Effect of environmental factors on Internet searches related to sinusitis.

PubMed

Willson, Thomas J; Lospinoso, Joshua; Weitzel, Erik K; McMains, Kevin C

2015-11-01

Sinusitis significantly affects the population of the United States, exacting direct cost and lost productivity. Patients are likely to search the Internet for information related to their health before seeking care by a healthcare professional. Utilizing data generated from these searches may serve as an epidemiologic surrogate. A retrospective time series analysis was performed. Google search trend data from the Dallas-Fort Worth metro region for the years 2012 and 2013 were collected from www.google.com/trends for terms related to sinusitis based on literature outlining the most important symptoms for diagnosis. Additional terms were selected based on common English language terms used to describe the disease. Twelve months of data from the same time period and location for common pollutants (nitrogen dioxide, ozone, sulfur dioxide, and particulates), pollen and mold counts, and influenza-like illness were also collected. Statistical analysis was performed using Pearson correlation coefficients, and potential search activity predictors were assessed using autoregressive integrated moving average. Pearson correlation was strongest between the terms congestion and influenza-like illness (r=0.615), and sinus and influenza-like illness (r=0.534) and nitrogen dioxide (r=0.487). Autoregressive integrated moving average analysis revealed ozone, influenza-like illness, and nitrogen dioxide levels to be potential predictors for sinus pressure searches, with estimates of 0.118, 0.349, and 0.438, respectively. Nitrogen dioxide was also a potential predictor for the terms congestion and sinus, with estimates of 0.191 and 0.272, respectively. Google search activity for related terms follow the pattern of seasonal influenza-like illness and nitrogen dioxide. These data highlight the epidemiologic potential of this novel surveillance method. NA. © 2015 The American Laryngological, Rhinological and Otological Society, Inc.
Representation and alignment of sung queries for music information retrieval

NASA Astrophysics Data System (ADS)

Adams, Norman H.; Wakefield, Gregory H.

2005-09-01

The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.
Querying databases of trajectories of differential equations: Data structures for trajectories

NASA Technical Reports Server (NTRS)

Grossman, Robert

1989-01-01

One approach to qualitative reasoning about dynamical systems is to extract qualitative information by searching or making queries on databases containing very large numbers of trajectories. The efficiency of such queries depends crucially upon finding an appropriate data structure for trajectories of dynamical systems. Suppose that a large number of parameterized trajectories gamma of a dynamical system evolving in R sup N are stored in a database. Let Eta is contained in set R sup N denote a parameterized path in Euclidean Space, and let the Euclidean Norm denote a norm on the space of paths. A data structure is defined to represent trajectories of dynamical systems, and an algorithm is sketched which answers queries.
A Study on Information Search and Commitment Strategies on Web Environment and Internet Usage Self-Efficacy Beliefs of University Students'

ERIC Educational Resources Information Center

Geçer, Aynur Kolburan

2014-01-01

This study addresses university students' information search and commitment strategies on web environment and internet usage self-efficacy beliefs in terms of such variables as gender, department, grade level and frequency of internet use; and whether there is a significant relation between these beliefs. Descriptive method was used in the study.…
Large scale study of multiple-molecule queries

PubMed Central

2009-01-01

Background In ligand-based screening, as well as in other chemoinformatics applications, one seeks to effectively search large repositories of molecules in order to retrieve molecules that are similar typically to a single molecule lead. However, in some case, multiple molecules from the same family are available to seed the query and search for other members of the same family. Multiple-molecule query methods have been less studied than single-molecule query methods. Furthermore, the previous studies have relied on proprietary data and sometimes have not used proper cross-validation methods to assess the results. In contrast, here we develop and compare multiple-molecule query methods using several large publicly available data sets and background. We also create a framework based on a strict cross-validation protocol to allow unbiased benchmarking for direct comparison in future studies across several performance metrics. Results Fourteen different multiple-molecule query methods were defined and benchmarked using: (1) 41 publicly available data sets of related molecules with similar biological activity; and (2) publicly available background data sets consisting of up to 175,000 molecules randomly extracted from the ChemDB database and other sources. Eight of the fourteen methods were parameter free, and six of them fit one or two free parameters to the data using a careful cross-validation protocol. All the methods were assessed and compared for their ability to retrieve members of the same family against the background data set by using several performance metrics including the Area Under the Accumulation Curve (AUAC), Area Under the Curve (AUC), F1-measure, and BEDROC metrics. Consistent with the previous literature, the best parameter-free methods are the MAX-SIM and MIN-RANK methods, which score a molecule to a family by the maximum similarity, or minimum ranking, obtained across the family. One new parameterized method introduced in this study and two
OpenSearch technology for geospatial resources discovery

NASA Astrophysics Data System (ADS)

Papeschi, Fabrizio; Enrico, Boldrini; Mazzetti, Paolo

2010-05-01

set of services for discovery, access, and processing of geospatial resources in a SOA framework. GI-cat is a distributed CSW framework implementation developed by the ESSI Lab of the Italian National Research Council (CNR-IMAA) and the University of Florence. It provides brokering and mediation functionalities towards heterogeneous resources and inventories, exposing several standard interfaces for query distribution. This work focuses on a new GI-cat interface which allows the catalog to be queried according to the OpenSearch syntax specification, thus filling the gap between the SOA architectural design of the CSW and the Web 2.0. At the moment, there is no OGC standard specification about this topic, but an official change request has been proposed in order to enable the OGC catalogues to support OpenSearch queries. In this change request, an OpenSearch extension is proposed providing a standard mechanism to query a resource based on temporal and geographic extents. Two new catalog operations are also proposed, in order to publish a suitable OpenSearch interface. This extended interface is implemented by the modular GI-cat architecture adding a new profiling module called "OpenSearch profiler". Since GI-cat also acts as a clearinghouse catalog, another component called "OpenSearch accessor" is added in order to access OpenSearch compliant services. An important role in the GI-cat extension, is played by the adopted mapping strategy. Two different kind of mappings are required: query, and response elements mapping. Query mapping is provided in order to fit the simple OpenSearch query syntax to the complex CSW query expressed by the OGC Filter syntax. GI-cat internal data model is based on the ISO-19115 profile, that is more complex than the simple XML syndication formats, such as RSS 2.0 and Atom 1.0, suggested by OpenSearch. Once response elements are available, in order to be presented, they need to be translated from the GI-cat internal data model, to the above
GeoSearcher: Location-Based Ranking of Search Engine Results.

ERIC Educational Resources Information Center

Watters, Carolyn; Amoudi, Ghada

2003-01-01

Discussion of Web queries with geospatial dimensions focuses on an algorithm that assigns location coordinates dynamically to Web sites based on the URL. Describes a prototype search system that uses the algorithm to re-rank search engine results for queries with a geospatial dimension, thus providing an alternative ranking order for search engine…
An ontology-based search engine for protein-protein interactions.

PubMed

Park, Byungkyu; Han, Kyungsook

2010-01-18

Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology.
A Method for Search Engine Selection using Thesaurus for Selective Meta-Search Engine

NASA Astrophysics Data System (ADS)

Goto, Shoji; Ozono, Tadachika; Shintani, Toramatsu

In this paper, we propose a new method for selecting search engines on WWW for selective meta-search engine. In selective meta-search engine, a method is needed that would enable selecting appropriate search engines for users' queries. Most existing methods use statistical data such as document frequency. These methods may select inappropriate search engines if a query contains polysemous words. In this paper, we describe an search engine selection method based on thesaurus. In our method, a thesaurus is constructed from documents in a search engine and is used as a source description of the search engine. The form of a particular thesaurus depends on the documents used for its construction. Our method enables search engine selection by considering relationship between terms and overcomes the problems caused by polysemous words. Further, our method does not have a centralized broker maintaining data, such as document frequency for all search engines. As a result, it is easy to add a new search engine, and meta-search engines become more scalable with our method compared to other existing methods.
Crowd-sourced Ontology for Photoleukocoria: Identifying Common Internet Search Terms for a Potentially Important Pediatric Ophthalmic Sign.

PubMed

Staffieri, Sandra E; Kearns, Lisa S; Sanfilippo, Paul G; Craig, Jamie E; Mackey, David A; Hewitt, Alex W

2018-02-01

Leukocoria is the most common presenting sign for pediatric eye disease including retinoblastoma and cataract, with worse outcomes if diagnosis is delayed. We investigated whether individuals could identify leukocoria in photographs (photoleukocoria) and examined their subsequent Internet search behavior. Using a web-based questionnaire, in this cross-sectional study we invited adults aged over 18 years to view two photographs of a child with photoleukocoria, and then search the Internet to determine a possible diagnosis and action plan. The most commonly used search terms and websites accessed were recorded. The questionnaire was completed by 1639 individuals. Facebook advertisement was the most effective recruitment strategy. The mean age of all respondents was 38.95 ± 14.59 years (range, 18-83), 94% were female, and 59.3% had children. An abnormality in the images presented was identified by 1613 (98.4%) participants. The most commonly used search terms were: "white," "pupil," "photo," and "eye" reaching a variety of appropriate websites or links to print or social media articles. Different words or phrases were used to describe the same observation of photoleukocoria leading to a range of websites. Variations in the description of observed signs and search words influenced the sites reached, information obtained, and subsequent help-seeking intentions. Identifying the most commonly used search terms for photoleukocoria is an important step for search engine optimization. Being directed to the most appropriate websites informing of the significance of photoleukocoria and the appropriate actions to take could improve delays in diagnosis of important pediatric eye disease such as retinoblastoma or cataract.
New Quality Metrics for Web Search Results

NASA Astrophysics Data System (ADS)

Metaxas, Panagiotis Takis; Ivanova, Lilia; Mustafaraj, Eni

Web search results enjoy an increasing importance in our daily lives. But what can be said about their quality, especially when querying a controversial issue? The traditional information retrieval metrics of precision and recall do not provide much insight in the case of web information retrieval. In this paper we examine new ways of evaluating quality in search results: coverage and independence. We give examples on how these new metrics can be calculated and what their values reveal regarding the two major search engines, Google and Yahoo. We have found evidence of low coverage for commercial and medical controversial queries, and high coverage for a political query that is highly contested. Given the fact that search engines are unwilling to tune their search results manually, except in a few cases that have become the source of bad publicity, low coverage and independence reveal the efforts of dedicated groups to manipulate the search results.
Querying Proofs

NASA Technical Reports Server (NTRS)

Aspinall, David; Denney, Ewen; Lueth, Christoph

2012-01-01

We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.
Generating and Executing Complex Natural Language Queries across Linked Data.

PubMed

Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

2015-01-01

With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.
The role of economics in the QUERI program: QUERI Series.

PubMed

Smith, Mark W; Barnett, Paul G

2008-04-22

The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.
Issues in the design of a pilot concept-based query interface for the neuroinformatics information framework.

PubMed

Marenco, Luis; Li, Yuli; Martone, Maryann E; Sternberg, Paul W; Shepherd, Gordon M; Miller, Perry L

2008-09-01

This paper describes a pilot query interface that has been constructed to help us explore a "concept-based" approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface.
Issues in the Design of a Pilot Concept-Based Query Interface for the Neuroinformatics Information Framework

PubMed Central

Li, Yuli; Martone, Maryann E.; Sternberg, Paul W.; Shepherd, Gordon M.; Miller, Perry L.

2009-01-01

This paper describes a pilot query interface that has been constructed to help us explore a “concept-based” approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface. PMID:18953674
Thesaurus-Enhanced Search Interfaces.

ERIC Educational Resources Information Center

Shiri, Ali Asghar; Revie, Crawford; Chowdhury, Gobinda

2002-01-01

Discussion of user interfaces to information retrieval systems focuses on interfaces that incorporate thesauri as part of their searching and browsing facilities. Discusses research literature related to information searching behavior, information retrieval interface evaluation, search term selection, and query expansion; and compares thesaurus…
BioSearch: a semantic search engine for Bio2RDF

PubMed Central

Qiu, Honglei; Huang, Jiacheng

2017-01-01

Abstract Biomedical data are growing at an incredible pace and require substantial expertise to organize data in a manner that makes them easily findable, accessible, interoperable and reusable. Massive effort has been devoted to using Semantic Web standards and technologies to create a network of Linked Data for the life sciences, among others. However, while these data are accessible through programmatic means, effective user interfaces for non-experts to SPARQL endpoints are few and far between. Contributing to user frustrations is that data are not necessarily described using common vocabularies, thereby making it difficult to aggregate results, especially when distributed across multiple SPARQL endpoints. We propose BioSearch — a semantic search engine that uses ontologies to enhance federated query construction and organize search results. BioSearch also features a simplified query interface that allows users to optionally filter their keywords according to classes, properties and datasets. User evaluation demonstrated that BioSearch is more effective and usable than two state of the art search and browsing solutions. Database URL: http://ws.nju.edu.cn/biosearch/ PMID:29220451

Getting a second opinion: health information and the Internet.

PubMed

Underhill, Cathy; Mckeown, Larry

2008-03-01

In 2005, more than one-third of Canadian adults used the Internet to search for health information. And of those who also visited a doctor, more than one-third discussed the results of their Internet search with their physician. This study raises important considerations. First, it is anticipated that as more Canadians access the Internet, online searches for health information will increase. However, the accuracy and reliability of Internet information on any topic can vary widely. Internet sources of health information range from personal accounts of illnesses and patient discussion groups to clinical decision tools and peer-reviewed journal articles. Second, the use of the Internet to search for health information appears to be unevenly distributed among Canadians. Searching for health information online is an example of what has been described as a second level digital divide among Internet users.
A rank-based Prediction Algorithm of Learning User's Intention

NASA Astrophysics Data System (ADS)

Shen, Jie; Gao, Ying; Chen, Cang; Gong, HaiPing

Internet search has become an important part in people's daily life. People can find many types of information to meet different needs through search engines on the Internet. There are two issues for the current search engines: first, the users should predetermine the types of information they want and then change to the appropriate types of search engine interfaces. Second, most search engines can support multiple kinds of search functions, each function has its own separate search interface. While users need different types of information, they must switch between different interfaces. In practice, most queries are corresponding to various types of information results. These queries can search the relevant results in various search engines, such as query "Palace" contains the websites about the introduction of the National Palace Museum, blog, Wikipedia, some pictures and video information. This paper presents a new aggregative algorithm for all kinds of search results. It can filter and sort the search results by learning three aspects about the query words, search results and search history logs to achieve the purpose of detecting user's intention. Experiments demonstrate that this rank-based method for multi-types of search results is effective. It can meet the user's search needs well, enhance user's satisfaction, provide an effective and rational model for optimizing search engines and improve user's search experience.
Effects of Individual Health Topic Familiarity on Activity Patterns During Health Information Searches

PubMed Central

Moriyama, Koichi; Fukui, Ken–ichi; Numao, Masayuki

2015-01-01

Background Non-medical professionals (consumers) are increasingly using the Internet to support their health information needs. However, the cognitive effort required to perform health information searches is affected by the consumer’s familiarity with health topics. Consumers may have different levels of familiarity with individual health topics. This variation in familiarity may cause misunderstandings because the information presented by search engines may not be understood correctly by the consumers. Objective As a first step toward the improvement of the health information search process, we aimed to examine the effects of health topic familiarity on health information search behaviors by identifying the common search activity patterns exhibited by groups of consumers with different levels of familiarity. Methods Each participant completed a health terminology familiarity questionnaire and health information search tasks. The responses to the familiarity questionnaire were used to grade the familiarity of participants with predefined health topics. The search task data were transcribed into a sequence of search activities using a coding scheme. A computational model was constructed from the sequence data using a Markov chain model to identify the common search patterns in each familiarity group. Results Forty participants were classified into L1 (not familiar), L2 (somewhat familiar), and L3 (familiar) groups based on their questionnaire responses. They had different levels of familiarity with four health topics. The video data obtained from all of the participants were transcribed into 4595 search activities (mean 28.7, SD 23.27 per session). The most frequent search activities and transitions in all the familiarity groups were related to evaluations of the relevancy of selected web pages in the retrieval results. However, the next most frequent transitions differed in each group and a chi-squared test confirmed this finding (P<.001). Next, according to the
EasyKSORD: A Platform of Keyword Search Over Relational Databases

NASA Astrophysics Data System (ADS)

Peng, Zhaohui; Li, Jing; Wang, Shan

Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.
A Systematic Assessment of Google Search Queries and Readability of Online Gynecologic Oncology Patient Education Materials.

PubMed

Martin, Alexandra; Stewart, J Ryan; Gaskins, Jeremy; Medlin, Erin

2018-01-20

The Internet is a major source of health information for gynecologic cancer patients. In this study, we systematically explore common Google search terms related to gynecologic cancer and calculate readability of top resulting websites. We used Google AdWords Keyword Planner to generate a list of commonly searched keywords related to gynecologic oncology, which were sorted into five groups (cervical cancer, ovarian cancer, uterine cancer, vulvar cancer, vaginal cancer) using five patient education websites from sgo.org . Each keyword was Google searched to create a list of top websites. The Python programming language (version 3.5.1) was used to describe frequencies of keywords, top-level domains (TLDs), domains, and readability of top websites using four validated formulae. Of the estimated 1,846,950 monthly searches resulting in 62,227 websites, the most common was cancer.org . The most common TLD was *.com. Most websites were above the eighth-grade reading level recommended by the American Medical Association (AMA) and the National Institute of Health (NIH). The SMOG Index was the most reliable formula. The mean grade level readability for all sites using SMOG was 9.4 ± 2.3, with 23.9% of sites falling at or below the eighth-grade reading level. The first ten results for each Google keyword were easiest to read with results beyond the first page of Google being consistently more difficult. Keywords related to gynecologic malignancies are Google-searched frequently. Most websites are difficult to read without a high school education. This knowledge may help gynecologic oncology providers adequately meet the needs of their patients.
The role of economics in the QUERI program: QUERI Series

PubMed Central

Smith, Mark W; Barnett, Paul G

2008-01-01

Background The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. Methods We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Results Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Conclusion Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics. PMID:18430199
In-context query reformulation for failing SPARQL queries

NASA Astrophysics Data System (ADS)

Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James

2017-05-01

Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.
Tracking changes in search behaviour at a health web site.

PubMed

Eklund, Ann-Marie

2012-01-01

Nowadays, the internet is used as a means to provide the public with official information on many different topics, including health related matters and care providers. In this work we have studied a search log from the official Swedish health web site 1177.se for patterns of search behaviour over time. To improve the analysis, we mapped the queries to UMLS semantic types and MeSH categories. Our analysis shows that, as expected, diseases and health care activities are the ones of most interest, but also a clear increased interest in geographical locations in the setting of health care providers. We also note a change over time in which kinds of diseases are of interest. Finally, we conclude that this type of analysis may be useful in studies of what health related topics matter to the public, but also for design and follow-up of public information campaigns.
An ontology-based search engine for protein-protein interactions

PubMed Central

2010-01-01

Background Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. Results We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Conclusion Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology. PMID:20122195
How To Do Field Searching in Web Search Engines: A Field Trip.

ERIC Educational Resources Information Center

Hock, Ran

1998-01-01

Describes the field search capabilities of selected Web search engines (AltaVista, HotBot, Infoseek, Lycos, Yahoo!) and includes a chart outlining what fields (date, title, URL, images, audio, video, links, page depth) are searchable, where to go on the page to search them, the syntax required (if any), and how field search queries are entered.…
Generating Personalized Web Search Using Semantic Context

PubMed Central

Xu, Zheng; Chen, Hai-Yan; Yu, Jie

2015-01-01

The “one size fits the all” criticism of search engines is that when queries are submitted, the same results are returned to different users. In order to solve this problem, personalized search is proposed, since it can provide different search results based upon the preferences of users. However, existing methods concentrate more on the long-term and independent user profile, and thus reduce the effectiveness of personalized search. In this paper, the method captures the user context to provide accurate preferences of users for effectively personalized search. First, the short-term query context is generated to identify related concepts of the query. Second, the user context is generated based on the click through data of users. Finally, a forgetting factor is introduced to merge the independent user context in a user session, which maintains the evolution of user preferences. Experimental results fully confirm that our approach can successfully represent user context according to individual user information needs. PMID:26000335
Adaptive search in mobile peer-to-peer databases

NASA Technical Reports Server (NTRS)

Wolfson, Ouri (Inventor); Xu, Bo (Inventor)

2010-01-01

Information is stored in a plurality of mobile peers. The peers communicate in a peer to peer fashion, using a short-range wireless network. Occasionally, a peer initiates a search for information in the peer to peer network by issuing a query. Queries and pieces of information, called reports, are transmitted among peers that are within a transmission range. For each search additional peers are utilized, wherein these additional peers search and relay information on behalf of the originator of the search.
Query Expansion and Query Translation as Logical Inference.

ERIC Educational Resources Information Center

Nie, Jian-Yun

2003-01-01

Examines query expansion during query translation in cross language information retrieval and develops a general framework for inferential information retrieval in two particular contexts: using fuzzy logic and probability theory. Obtains evaluation formulas that are shown to strongly correspond to those used in other information retrieval models.…
NEOview: Near Earth Object Data Discovery and Query

NASA Astrophysics Data System (ADS)

Tibbetts, M.; Elvis, M.; Galache, J. L.; Harbo, P.; McDowell, J. C.; Rudenko, M.; Van Stone, D.; Zografou, P.

2013-10-01

Missions to Near Earth Objects (NEOs) figure prominently in NASA's Flexible Path approach to human space exploration. NEOs offer insight into both the origins of the Solar System and of life, as well as a source of materials for future missions. With NEOview scientists can locate NEO datasets, explore metadata provided by the archives, and query or combine disparate NEO datasets in the search for NEO candidates for exploration. NEOview is a software system that illustrates how standards-based interfaces facilitate NEO data discovery and research. NEOview software follows a client-server architecture. The server is a configurable implementation of the International Virtual Observatory Alliance (IVOA) Table Access Protocol (TAP), a general interface for tabular data access, that can be deployed as a front end to existing NEO datasets. The TAP client, seleste, is a graphical interface that provides intuitive means of discovering NEO providers, exploring dataset metadata to identify fields of interest, and constructing queries to retrieve or combine data. It features a powerful, graphical query builder capable of easing the user's introduction to table searches. Through science use cases, NEOview demonstrates how potential targets for NEO rendezvous could be identified by combining data from complementary sources. Through deployment and operations, it has been shown that the software components are data independent and configurable to many different data servers. As such, NEOview's TAP server and seleste TAP client can be used to create a seamless environment for data discovery and exploration for tabular data in any astronomical archive.
Search Alternatives and Beyond

ERIC Educational Resources Information Center

Bell, Steven J.

2006-01-01

Internet search has become a routine computing activity, with regular visits to a search engine--usually Google--the norm for most people. The vast majority of searchers, as recent studies of Internet search behavior reveal, search only in the most basic of ways and fail to avail themselves of options that could easily and effortlessly improve…
Proactive Support of Internet Browsing when Searching for Relevant Health Information.

PubMed

Rurik, Clas; Zowalla, Richard; Wiesner, Martin; Pfeifer, Daniel

2015-01-01

Many people use the Internet as one of the primary sources of health information. This is due to the high volume and easy access of freely available information regarding diseases, diagnoses and treatments. However, users may find it difficult to retrieve information which is easily understandable and does not require a deep medical background. In this paper, we present a new kind of Web browser add-on, in order to proactively support users when searching for relevant health information. Our add-on not only visualizes the understandability of displayed medical text but also provides further recommendations of Web pages which hold similar content but are potentially easier to comprehend.
Automatic Concept-Based Query Expansion Using Term Relational Pathways Built from a Collection-Specific Association Thesaurus

ERIC Educational Resources Information Center

Lyall-Wilson, Jennifer Rae

2013-01-01

The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of…
Natural supplements for H1N1 influenza: retrospective observational infodemiology study of information and search activity on the Internet.

PubMed

Hill, Shawndra; Mao, Jun; Ungar, Lyle; Hennessy, Sean; Leonard, Charles E; Holmes, John

2011-05-10

As the incidence of H1N1 increases, the lay public may turn to the Internet for information about natural supplements for prevention and treatment. Our objective was to identify and characterize websites that provide information about herbal and natural supplements with information about H1N1 and to examine trends in the public's behavior in searching for information about supplement use in preventing or treating H1N1. This was a retrospective observational infodemiology study of indexed websites and Internet search activity over the period January 1, 2009, through November 15, 2009. The setting is the Internet as indexed by Google with aggregated Internet user data. The main outcome measures were the frequency of "hits" or webpages containing terms relating to natural supplements co-occurring with H1N1/swine flu, terms relating to natural supplements co-occurring with H1N1/swine flu proportional to all terms relating to natural supplements, webpage rank, webpage entropy, and temporal trend in search activity. A large number of websites support information about supplements and H1N1. The supplement with the highest proportion of H1N1/swine flu information was a homeopathic remedy known as Oscillococcinum that has no known side effects; supplements with the next highest proportions have known side effects and interactions. Webpages with both supplement and H1N1/swine flu information were less likely to be medically curated or authoritative. Search activity for supplements was temporally related to H1N1/swine flu-related news reports and events. The prevalence of nonauthoritative webpages with information about supplements in the context of H1N1/swine flu and the increasing number of searches for these pages suggest that the public is interested in alternatives to traditional prevention and treatment of H1N1. The quality of this information is often questionable and clinicians should be cognizant that patients may be at risk of adverse events associated with the use
Multi-field query expansion is effective for biomedical dataset retrieval

PubMed Central

2017-01-01

Abstract In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data
Heat stroke internet searches can be a new heatwave health warning surveillance indicator

PubMed Central

Li, Tiantian; Ding, Fan; Sun, Qinghua; Zhang, Yi; Kinney, Patrick L.

2016-01-01

The impact of major heatwave shocks on population morbidity and mortality has become an urgent public health concern. However, Current heatwave warning systems suffer from a lack of validation and an inability to provide accurate health risk warnings in a timely way. Here we conducted a correlation and linear regression analysis to test the relationship between heat stroke internet searches and heat stroke health outcomes in Shanghai, China, during the summer of 2013. We show that the resulting heatstroke index captures much of the variation in heat stroke cases and deaths. The correlation between heat stroke deaths, the search index and the incidence of heat stroke is higher than the correlation with maximum temperature. This study highlights a fast and effective heatwave health warning indicator with potential to be used throughout the world. PMID:27869135

Heat stroke internet searches can be a new heatwave health warning surveillance indicator

NASA Astrophysics Data System (ADS)

Li, Tiantian; Ding, Fan; Sun, Qinghua; Zhang, Yi; Kinney, Patrick L.

2016-11-01

The impact of major heatwave shocks on population morbidity and mortality has become an urgent public health concern. However, Current heatwave warning systems suffer from a lack of validation and an inability to provide accurate health risk warnings in a timely way. Here we conducted a correlation and linear regression analysis to test the relationship between heat stroke internet searches and heat stroke health outcomes in Shanghai, China, during the summer of 2013. We show that the resulting heatstroke index captures much of the variation in heat stroke cases and deaths. The correlation between heat stroke deaths, the search index and the incidence of heat stroke is higher than the correlation with maximum temperature. This study highlights a fast and effective heatwave health warning indicator with potential to be used throughout the world.
Query-by-example surgical activity detection.

PubMed

Gao, Yixin; Vedula, S Swaroop; Lee, Gyusung I; Lee, Mija R; Khudanpur, Sanjeev; Hager, Gregory D

2016-06-01

Easy acquisition of surgical data opens many opportunities to automate skill evaluation and teaching. Current technology to search tool motion data for surgical activity segments of interest is limited by the need for manual pre-processing, which can be prohibitive at scale. We developed a content-based information retrieval method, query-by-example (QBE), to automatically detect activity segments within surgical data recordings of long duration that match a query. The example segment of interest (query) and the surgical data recording (target trial) are time series of kinematics. Our approach includes an unsupervised feature learning module using a stacked denoising autoencoder (SDAE), two scoring modules based on asymmetric subsequence dynamic time warping (AS-DTW) and template matching, respectively, and a detection module. A distance matrix of the query against the trial is computed using the SDAE features, followed by AS-DTW combined with template scoring, to generate a ranked list of candidate subsequences (substrings). To evaluate the quality of the ranked list against the ground-truth, thresholding conventional DTW distances and bipartite matching are applied. We computed the recall, precision, F1-score, and a Jaccard index-based score on three experimental setups. We evaluated our QBE method using a suture throw maneuver as the query, on two tool motion datasets (JIGSAWS and MISTIC-SL) captured in a training laboratory. We observed a recall of 93, 90 and 87 % and a precision of 93, 91, and 88 % with same surgeon same trial (SSST), same surgeon different trial (SSDT) and different surgeon (DS) experiment setups on JIGSAWS, and a recall of 87, 81 and 75 % and a precision of 72, 61, and 53 % with SSST, SSDT and DS experiment setups on MISTIC-SL, respectively. We developed a novel, content-based information retrieval method to automatically detect multiple instances of an activity within long surgical recordings. Our method demonstrated adequate recall
Saying What You're Looking For: Linguistics Meets Video Search.

PubMed

Barrett, Daniel Paul; Barbu, Andrei; Siddharth, N; Siskind, Jeffrey Mark

2016-10-01

We present an approach to searching large video corpora for clips which depict a natural-language query in the form of a sentence. Compositional semantics is used to encode subtle meaning differences lost in other approaches, such as the difference between two sentences which have identical words but entirely different meaning: The person rode the horse versus The horse rode the person. Given a sentential query and a natural-language parser, we produce a score indicating how well a video clip depicts that sentence for each clip in a corpus and return a ranked list of clips. Two fundamental problems are addressed simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, our approach uses the sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While most earlier work was limited to single-word queries which correspond to either verbs or nouns, we search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 2,627 naturally elicited sentential queries in 10 Hollywood movies.
Query Results Clustering by Extending SPARQL with CLUSTER BY

NASA Astrophysics Data System (ADS)

Ławrynowicz, Agnieszka

The task of dynamic clustering of the search results proved to be useful in the Web context, where the user often does not know the granularity of the search results in advance. The goal of this paper is to provide a declarative way for invoking dynamic clustering of the results of queries submitted over Semantic Web data. To achieve this goal the paper proposes an approach that extends SPARQL by clustering abilities. The approach introduces a new statement, CLUSTER BY, into the SPARQL grammar and proposes semantics for such extension.
A COMPARISON OF PATIENT AND HEALTHCARE PROFESSIONAL VIEWS WHEN ASSESSING QUALITY OF INFORMATION ON PITUITARY ADENOMA AVAILABLE ON THE INTERNET.

PubMed

Druce, Irena; Williams, Chantal; Baggoo, Carolyn; Keely, Erin; Malcolm, Janine

2017-10-01

Patients are increasingly turning to the internet to seek reliable sources of health information and desire guidance in assessing the quality of information as healthcare becomes progressively more complex. Pituitary adenomas are a rare, diverse group of tumors associated with increased mortality and morbidity whose management requires a multidisciplinary approach. As such, patients with this disorder are often searching for additional sources of healthcare information. We undertook a study to assess the quality of information available on the internet for patients with pituitary adenoma. After exclusion, 42 websites were identified based on a search engine query with various search terms. Each website was assessed in triplicate: once by a health professional, once by a simulated patient, and once by a patient who had a pituitary adenoma and underwent medical and surgical treatment. The assessment tools included a content-specific questionnaire, the DISCERN tool, and the Ensuring Quality Information for Patients tool. The readability of the information was assessed with the Flesch-Kincaid grade level. We found that the overall quality of information on pituitary adenoma on the internet was variable and written at a high grade level. Correlation between the different assessors was poor, indicating that there may be differences in how healthcare professionals and patients view healthcare information. Our findings highlight the importance of assessment of the health information by groups of the intended user to ensure the needs of that population are met. Abbreviation: EQIP = Ensuring Quality Information for Patients.
Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering.

PubMed

Deveci, Mehmet; Küçüktunç, Onur; Eren, Kemal; Bozdağ, Doruk; Kaya, Kamer; Çatalyürek, Ümit V

2016-01-01

Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.
Evolving discriminators for querying video sequences

NASA Astrophysics Data System (ADS)

Iyengar, Giridharan; Lippman, Andrew B.

1997-01-01

In this paper we present a framework for content based query and retrieval of information from large video databases. This framework enables content based retrieval of video sequences by characterizing the sequences using motion, texture and colorimetry cues. This characterization is biologically inspired and results in a compact parameter space where every segment of video is represented by an 8 dimensional vector. Searching and retrieval is done in real- time with accuracy in this parameter space. Using this characterization, we then evolve a set of discriminators using Genetic Programming Experiments indicate that these discriminators are capable of analyzing and characterizing video. The VideoBook is able to search and retrieve video sequences with 92% accuracy in real-time. Experiments thus demonstrate that the characterization is capable of extracting higher level structure from raw pixel values.
Noesis: Ontology based Scoped Search Engine and Resource Aggregator for Atmospheric Science

NASA Astrophysics Data System (ADS)

Ramachandran, R.; Movva, S.; Li, X.; Cherukuri, P.; Graves, S.

2006-12-01

The goal for search engines is to return results that are both accurate and complete. The search engines should find only what you really want and find everything you really want. Search engines (even meta search engines) lack semantics. The basis for search is simply based on string matching between the user's query term and the resource database and the semantics associated with the search string is not captured. For example, if an atmospheric scientist is searching for "pressure" related web resources, most search engines return inaccurate results such as web resources related to blood pressure. In this presentation Noesis, which is a meta-search engine and a resource aggregator that uses domain ontologies to provide scoped search capabilities will be described. Noesis uses domain ontologies to help the user scope the search query to ensure that the search results are both accurate and complete. The domain ontologies guide the user to refine their search query and thereby reduce the user's burden of experimenting with different search strings. Semantics are captured by refining the query terms to cover synonyms, specializations, generalizations and related concepts. Noesis also serves as a resource aggregator. It categorizes the search results from different online resources such as education materials, publications, datasets, web search engines that might be of interest to the user.
Is the Internet a Suitable Patient Resource for Information on Common Radiological Investigations?: Radiology-Related Information on the Internet.

PubMed

Bowden, Dermot J; Yap, Lee-Chien; Sheppard, Declan G

2017-07-01

material with lower mean DISCERN scores: X-ray (17.25 vs 31.69), magnetic resonance imaging (20.8 vs 40.1), ultrasound (24.11 vs 42.35), and positron emission tomography (24.5 vs 44.45) (P < .01). Although readability is adequate, the overall quality of radiology-related health-care information on the Internet is poor. High-quality online resources should be identified so that patients may avoid the use of poor-quality information derived from general search engine queries. Copyright © 2017 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
GEMINI: a computationally-efficient search engine for large gene expression datasets.

PubMed

DeFreitas, Timothy; Saddiki, Hachem; Flaherty, Patrick

2016-02-24

Low-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and use that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data using a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query - a text-based string - is mismatched with the form of the target - a genomic profile. To improve access to massive genomic data resources, we have developed a fast search engine, GEMINI, that uses a genomic profile as a query to search for similar genomic profiles. GEMINI implements a nearest-neighbor search algorithm using a vantage-point tree to store a database of n profiles and in certain circumstances achieves an [Formula: see text] expected query time in the limit. We tested GEMINI on breast and ovarian cancer gene expression data from The Cancer Genome Atlas project and show that it achieves a query time that scales as the logarithm of the number of records in practice on genomic data. In a database with 10(5) samples, GEMINI identifies the nearest neighbor in 0.05 sec compared to a brute force search time of 0.6 sec. GEMINI is a fast search engine that uses a query genomic profile to search for similar profiles in a very large genomic database. It enables users to identify similar profiles independent of sample label, data origin or other meta-data information.
Side effects of radiotherapy in breast cancer patients : The Internet as an information source.

PubMed

Janssen, S; Käsmann, L; Fahlbusch, F B; Rades, D; Vordermark, D

2018-02-01

Breast cancer is the most common cancer type among women necessitating adjuvant radiotherapy. As the Internet has become a major source of information for cancer patients, this study aimed to evaluate the quality of websites giving information on side effects of radiotherapy for breast cancer patients. A patients' search for the English terms "breast cancer - radiotherapy - side effects" and the corresponding German terms "Brustkrebs - Strahlentherapie - Nebenwirkungen" was carried out twice (5 months apart) using the search engine Google. The first 30 search results each were evaluated using the validated 16-question DISCERN Plus instrument, the Health on the Net Code of Conduct (HONcode) certification and the Journal of the American Medical Association (JAMA) benchmark criteria. The overall quality (DISCERN score) of the retrieved websites was further compared to queries via Bing and Yahoo search engines. The DISCERN score showed a great range, with the majority of websites ranking fair to poor. Significantly superior results were found for English websites, particularly for webpages run by hospitals/universities and nongovernmental organizations (NGO), when compared to the respective German categories. In general, only a minority of websites met all JAMA benchmarks and was HONcode certified (both languages). We did not determine a relevant temporal change in website ranking among the top ten search hits, while significant variation occurred thereafter. Mean overall DISCERN score was similar between the various search engines. The Internet can give breast cancer patients seeking information on side effects of radiotherapy an overview. However, based on the currently low overall quality of websites and the lack of transparency for the average layperson, we emphasize the value of personal contact with the treating radio-oncologist in order to integrate and interpret the information found online.
Sundanese ancient manuscripts search engine using probability approach

NASA Astrophysics Data System (ADS)

Suryani, Mira; Hadi, Setiawan; Paulus, Erick; Nurma Yulita, Intan; Supriatna, Asep K.

2017-10-01

Today, Information and Communication Technology (ICT) has become a regular thing for every aspect of live include cultural and heritage aspect. Sundanese ancient manuscripts as Sundanese heritage are in damage condition and also the information that containing on it. So in order to preserve the information in Sundanese ancient manuscripts and make them easier to search, a search engine has been developed. The search engine must has good computing ability. In order to get the best computation in developed search engine, three types of probabilistic approaches: Bayesian Networks Model, Divergence from Randomness with PL2 distribution, and DFR-PL2F as derivative form DFR-PL2 have been compared in this study. The three probabilistic approaches supported by index of documents and three different weighting methods: term occurrence, term frequency, and TF-IDF. The experiment involved 12 Sundanese ancient manuscripts. From 12 manuscripts there are 474 distinct terms. The developed search engine tested by 50 random queries for three types of query. The experiment results showed that for the single query and multiple query, the best searching performance given by the combination of PL2F approach and TF-IDF weighting method. The performance has been evaluated using average time responds with value about 0.08 second and Mean Average Precision (MAP) about 0.33.
Internet Searching About Disease Elicits a Positive Perception of Own Health When Severity of Illness Is High: A Longitudinal Questionnaire Study

PubMed Central

Greving, Hannah

2016-01-01

Background The Internet is one of the primary sources for health information. However, in research, the effects of Internet use on the perception of one’s own health have not received much attention so far. Objective This study tested how Internet use for acquiring health information and severity of illness influence patients with a chronic disease with regard to the perception of their own health. Negative psychological states are known to lead to preferential processing of positive information. In particular, the self-directed nature of Internet use provides room for such biases. Therefore, we predicted that patients experiencing negative health states more frequently, due to more frequent episodes of a chronic illness, will gain a more positive perception of their health if they use the Internet frequently to gain health information, but not if they use the Internet rarely. This effect was not expected for other sources of information. Methods A longitudinal questionnaire study with two measurement points—with a 7-month time lag—tested the hypothesis in a sample of patients with chronic inflammatory bowel disease (n=208). This study assessed patients’ frequency of Internet use, their participation in online social support groups, their use of other sources of health information, and several indicators of the participants’ perceptions of their own health. A structure equation model (SEM) was used to test the predictions separately for Internet searches and other sources of information. Results Data analysis supported the prediction; the interaction between frequency of health-related information searches and frequency of episodes at the first measurement point (T1) was related to participants’ positive perceptions of their own health at the second measurement point (T2) (B=.10, SE=.04, P=.02) above and beyond the perceptions of their own health at T1. When participants used the Internet relatively rarely (-1 SD), there was no relationship between
Internet Searching About Disease Elicits a Positive Perception of Own Health When Severity of Illness Is High: A Longitudinal Questionnaire Study.

PubMed

Sassenberg, Kai; Greving, Hannah

2016-03-04

The Internet is one of the primary sources for health information. However, in research, the effects of Internet use on the perception of one's own health have not received much attention so far. This study tested how Internet use for acquiring health information and severity of illness influence patients with a chronic disease with regard to the perception of their own health. Negative psychological states are known to lead to preferential processing of positive information. In particular, the self-directed nature of Internet use provides room for such biases. Therefore, we predicted that patients experiencing negative health states more frequently, due to more frequent episodes of a chronic illness, will gain a more positive perception of their health if they use the Internet frequently to gain health information, but not if they use the Internet rarely. This effect was not expected for other sources of information. A longitudinal questionnaire study with two measurement points-with a 7-month time lag-tested the hypothesis in a sample of patients with chronic inflammatory bowel disease (n=208). This study assessed patients' frequency of Internet use, their participation in online social support groups, their use of other sources of health information, and several indicators of the participants' perceptions of their own health. A structure equation model (SEM) was used to test the predictions separately for Internet searches and other sources of information. Data analysis supported the prediction; the interaction between frequency of health-related information searches and frequency of episodes at the first measurement point (T1) was related to participants' positive perceptions of their own health at the second measurement point (T2) (B=.10, SE=.04, P=.02) above and beyond the perceptions of their own health at T1. When participants used the Internet relatively rarely (-1 SD), there was no relationship between frequency of episodes and positive perceptions of
How often people google for vaccination: Qualitative and quantitative insights from a systematic search of the web-based activities using Google Trends

PubMed Central

Barberis, Ilaria; Rosselli, Roberto; Gianfredi, Vincenza; Nucci, Daniele; Moretti, Massimo; Salvatori, Tania; Martucci, Gianfranco; Martini, Mariano

2017-01-01

ABSTRACT Nowadays, more and more people surf the Internet seeking health-related information. Information and communication technologies (ICTs) can represent an important opportunities in the field of Public Health and vaccinology. The aim of our current research was to investigate a) how often people search the Internet for vaccination-related information, b) if this search is spontaneous or induced by media, and c) which kind of information is in particular searched. We used Google Trends (GT) for monitoring the interest for preventable infections and related vaccines. When looking for vaccine preventable infectious diseases, vaccine was not a popular topic, with some valuable exceptions, including the vaccine against Human Papillomavirus (HPV). Vaccines-related queries represented approximately one third of the volumes regarding preventable infections, greatly differing among the vaccines. However, the interest for vaccines is increasing throughout time: in particular, users seek information about possible vaccine-related side-effects. The five most searched vaccines are those against 1) influenza; 2) meningitis; 3) diphtheria, pertussis (whooping cough), and tetanus; 4) yellow fever; and 5) chickenpox. ICTs can have a positive influence on parental vaccine-related knowledge, attitudes, beliefs and vaccination willingness. GT can be used for monitoring the interest for vaccinations and the main information searched. PMID:27983896
VISAGE: Interactive Visual Graph Querying.

PubMed

Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

2016-06-01

Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.
VISAGE: Interactive Visual Graph Querying

PubMed Central

Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng

2017-01-01

Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670
Teaching With the Internet.

ERIC Educational Resources Information Center

Herron, Terri L.

1998-01-01

Discusses ways to use the Internet as a pedagogical tool in higher education, with illustrations from techniques and resources used in a graduate course in accounting information systems. Examples include use of an online textbook, an Internet-based project, electronic mail, a class Web page, and Internet searching to find course-related…
Combinatorial Fusion Analysis for Meta Search Information Retrieval

NASA Astrophysics Data System (ADS)

Hsu, D. Frank; Taksa, Isak

Leading commercial search engines are built as single event systems. In response to a particular search query, the search engine returns a single list of ranked search results. To find more relevant results the user must frequently try several other search engines. A meta search engine was developed to enhance the process of multi-engine querying. The meta search engine queries several engines at the same time and fuses individual engine results into a single search results list. The fusion of multiple search results has been shown (mostly experimentally) to be highly effective. However, the question of why and how the fusion should be done still remains largely unanswered. In this chapter, we utilize the combinatorial fusion analysis proposed by Hsu et al. to analyze combination and fusion of multiple sources of information. A rank/score function is used in the design and analysis of our framework. The framework provides a better understanding of the fusion phenomenon in information retrieval. For example, to improve the performance of the combined multiple scoring systems, it is necessary that each of the individual scoring systems has relatively high performance and the individual scoring systems are diverse. Additionally, we illustrate various applications of the framework using two examples from the information retrieval domain.
Designing drugs on the internet? Free web tools and services supporting medicinal chemistry.

PubMed

Ertl, Peter; Jelfs, Stephen

2007-01-01

The drug discovery process is supported by a multitude of freely available tools on the Internet. This paper summarizes some of the databases and tools that are of particular interest to medicinal chemistry. These include numerous data collections that provide access to valuable chemical data resources, allowing complex queries of compound structures, associated physicochemical properties and biological activities to be performed and, in many cases, providing links to commercial chemical suppliers. Further applications are available for searching protein-ligand complexes and identifying important binding interactions that occur. This is particularly useful for understanding the molecular recognition of ligands in the lead optimization process. The Internet also provides access to databases detailing metabolic pathways and transformations which can provide insight into disease mechanism, identify new targets entities or the potential off-target effects of a drug candidate. Furthermore, sophisticated online cheminformatics tools are available for processing chemical structures, predicting properties, and generating 2D or 3D structure representations--often required prior to more advanced analyses. The Internet provides a wealth of valuable resources that, if fully exploited, can greatly benefit the drug discovery community. In this paper, we provide an overview of some of the more important of these and, in particular, the freely accessible resources that are currently available.

Multi-field query expansion is effective for biomedical dataset retrieval.

PubMed

Bouadjenek, Mohamed Reda; Verspoor, Karin

2017-01-01

In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery
Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying.

PubMed

Kropf, Stefan; Krücken, Peter; Mueller, Wolf; Denecke, Kerstin

2017-05-18

Clinical information is often stored as free text, e.g. in discharge summaries or pathology reports. These documents are semi-structured using section headers, numbered lists, items and classification strings. However, it is still challenging to retrieve relevant documents since keyword searches applied on complete unstructured documents result in many false positive retrieval results. We are concentrating on the processing of pathology reports as an example for unstructured clinical documents. The objective is to transform reports semi-automatically into an information structure that enables an improved access and retrieval of relevant data. The data is expected to be stored in a standardized, structured way to make it accessible for queries that are applied to specific sections of a document (section-sensitive queries) and for information reuse. Our processing pipeline comprises information modelling, section boundary detection and section-sensitive queries. For enabling a focused search in unstructured data, documents are automatically structured and transformed into a patient information model specified through openEHR archetypes. The resulting XML-based pathology electronic health records (PEHRs) are queried by XQuery and visualized by XSLT in HTML. Pathology reports (PRs) can be reliably structured into sections by a keyword-based approach. The information modelling using openEHR allows saving time in the modelling process since many archetypes can be reused. The resulting standardized, structured PEHRs allow accessing relevant data by retrieving data matching user queries. Mapping unstructured reports into a standardized information model is a practical solution for a better access to data. Archetype-based XML enables section-sensitive retrieval and visualisation by well-established XML techniques. Focussing the retrieval to particular sections has the potential of saving retrieval time and improving the accuracy of the retrieval.
Pattern Recognition-Assisted Infrared Library Searching of the Paint Data Query Database to Enhance Lead Information from Automotive Paint Trace Evidence.

PubMed

Lavine, Barry K; White, Collin G; Allen, Matthew D; Weakley, Andrew

2017-03-01

Multilayered automotive paint fragments, which are one of the most complex materials encountered in the forensic science laboratory, provide crucial links in criminal investigations and prosecutions. To determine the origin of these paint fragments, forensic automotive paint examiners have turned to the paint data query (PDQ) database, which allows the forensic examiner to compare the layer sequence and color, texture, and composition of the sample to paint systems of the original equipment manufacturer (OEM). However, modern automotive paints have a thin color coat and this layer on a microscopic fragment is often too thin to obtain accurate chemical and topcoat color information. A search engine has been developed for the infrared (IR) spectral libraries of the PDQ database in an effort to improve discrimination capability and permit quantification of discrimination power for OEM automotive paint comparisons. The similarity of IR spectra of the corresponding layers of various records for original finishes in the PDQ database often results in poor discrimination using commercial library search algorithms. A pattern recognition approach employing pre-filters and a cross-correlation library search algorithm that performs both a forward and backward search has been used to significantly improve the discrimination of IR spectra in the PDQ database and thus improve the accuracy of the search. This improvement permits inter-comparison of OEM automotive paint layer systems using the IR spectra alone. Such information can serve to quantify the discrimination power of the original automotive paint encountered in casework and further efforts to succinctly communicate trace evidence to the courts.
Omokage search: shape similarity search service for biomolecular structures in both the PDB and EMDB.

PubMed

Suzuki, Hirofumi; Kawabata, Takeshi; Nakamura, Haruki

2016-02-15

Omokage search is a service to search the global shape similarity of biological macromolecules and their assemblies, in both the Protein Data Bank (PDB) and Electron Microscopy Data Bank (EMDB). The server compares global shapes of assemblies independent of sequence order and number of subunits. As a search query, the user inputs a structure ID (PDB ID or EMDB ID) or uploads an atomic model or 3D density map to the server. The search is performed usually within 1 min, using one-dimensional profiles (incremental distance rank profiles) to characterize the shapes. Using the gmfit (Gaussian mixture model fitting) program, the found structures are fitted onto the query structure and their superimposed structures are displayed on the Web browser. Our service provides new structural perspectives to life science researchers. Omokage search is freely accessible at http://pdbj.org/omokage/. © The Author 2015. Published by Oxford University Press.
Querying Safety Cases

NASA Technical Reports Server (NTRS)

Denney, Ewen W.; Naylor, Dwight; Pai, Ganesh

2014-01-01

Querying a safety case to show how the various stakeholders' concerns about system safety are addressed has been put forth as one of the benefits of argument-based assurance (in a recent study by the Health Foundation, UK, which reviewed the use of safety cases in safety-critical industries). However, neither the literature nor current practice offer much guidance on querying mechanisms appropriate for, or available within, a safety case paradigm. This paper presents a preliminary approach that uses a formal basis for querying safety cases, specifically Goal Structuring Notation (GSN) argument structures. Our approach semantically enriches GSN arguments with domain-specific metadata that the query language leverages, along with its inherent structure, to produce views. We have implemented the approach in our toolset AdvoCATE, and illustrate it by application to a fragment of the safety argument for an Unmanned Aircraft System (UAS) being developed at NASA Ames. We also discuss the potential practical utility of our query mechanism within the context of the existing framework for UAS safety assurance.
Trust in online prescription drug information among internet users: the impact on information search behavior after exposure to direct-to-consumer advertising.

PubMed

Menon, Ajit M; Deshpande, Aparna D; Perri, Matthew; Zinkhan, George M

2002-01-01

The proliferation of both manufacturer-controlled and independent medication-related websites has aroused concern among consumers and policy-makers concerning the trustworthiness of Web-based drug information. The authors examine consumers' trust in on-line prescription drug information and its influence on information search behavior. The study design involves a retrospective analysis of data from a 1998 national survey. The findings reveal that trust in drug information from traditional media sources such as television and newspapers transfers to the domain of the Internet. Furthermore, a greater trust in on-line prescription drug information stimulates utilization of the Internet for information search after exposure to prescription drug advertising.
LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.

PubMed

Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias

2018-03-01

In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions
Multi-INT Complex Event Processing using Approximate, Incremental Graph Pattern Search

DTIC Science & Technology

2012-06-01

graph pattern search and SPARQL queries . Total execution time for 10 executions each of 5 random pattern searches in synthetic data sets...01/11 1000 10000 100000 RDF triples Time (secs) 10 20 Graph pattern algorithm SPARQL queries Initial Performance Comparisons 09/18/11 2011 Thrust Area
[Differences in access to Internet and Internet-based information seeking according to the type of psychiatric disorder].

PubMed

Brunault, P; Bray, A; Rerolle, C; Cognet, S; Gaillard, P; El-Hage, W

2017-04-01

Internet has become a major tool for patients to search for health-related information and to communicate on health. We currently lack data on how patients with psychiatric disorders access and use Internet to search for information on their mental health. This study aimed to assess, in patients followed for a psychiatric disorder (schizophrenia, bipolar disorder, mood and anxiety disorder, substance-related and addictive disorders and eating disorders), prevalence of Internet access and use, and patient expectations and needs regarding the use of Internet to search for mental-health information depending on the psychiatric disorder. We conducted this cross-sectional study between May 2013 and July 2013 in 648 patients receiving psychiatric care in 8 hospitals from the Region Centre, France. We used multivariate logistic regression adjusted for age, gender, socio-educational level and professional status to compare use, expectations and needs regarding Internet-based information about the patient's psychiatric disorder (65-items self-administered questionnaires) as a function of the psychiatric disorders. We identified patients clusters with multiple correspondence analysis and ascending hierarchical classification. Although 65.6% of our population accessed Internet at home, prevalence for Internet access varied depending on the type of psychiatric disorder and was much more related to limited access to a computer and low income than to a lack of interest in the Internet. Most of the patients who used Internet were interested in having access to reliable Internet-based information on their health (76.8%), and most used Internet to search for Internet based health-information about their psychiatric disorder (58.8%). We found important differences in terms of expectations and needs depending on the patient's psychiatric disorder (e.g., higher interest in Internet-based information among patients with bipolar disorder, substance-related and addictive disorders
Life-Satisfaction, Values and Goal Achievement: The Case of Planned versus by Chance Searches on the Internet

ERIC Educational Resources Information Center

Casas, Ferran; Gonzalez, Monica; Figuer, Cristina; Coenders, Germa

2004-01-01

The relation between life domains satisfaction and overall life satisfaction, values, internal/external perceived control and the option of planning or by chance searching information on the Internet has been explored in a sample of Spanish adolescents aged 12 to 16 (N=968). Age and sex differences have been examined. Results clearly confirm a…
Multitasking Web Searching and Implications for Design.

ERIC Educational Resources Information Center

Ozmutlu, Seda; Ozmutlu, H. C.; Spink, Amanda

2003-01-01

Findings from a study of users' multitasking searches on Web search engines include: multitasking searches are a noticeable user behavior; multitasking search sessions are longer than regular search sessions in terms of queries per session and duration; both Excite and AlltheWeb.com users search for about three topics per multitasking session and…
Chinese older adults' Internet use for health information.

PubMed

Wong, Carmen K M; Yeung, Dannii Y; Ho, Henry C Y; Tse, Kin-Po; Lam, Chun-Yiu

2014-04-01

Technological advancement benefits Internet users with the convenience of social connection and information search. This study aimed at investigating the predictors of Internet use to search for online health information among Chinese older adults. The Technology Acceptance Model (TAM) was applied to examine the predictiveness of perceived ease of use, perceived usefulness, and attitudes toward Internet use on behavioral intention to search for health information online. Ninety-eight Chinese older adults were recruited from an academic institute for older people and community centers. Frequency of Internet use and physical and psychological health were also assessed. Results showed that perceived ease of use and attitudes significantly predicted behavioral intention of Internet use. The potential influences of traditional Chinese values and beliefs in health were also discussed.
Fast Multivariate Search on Large Aviation Datasets

NASA Technical Reports Server (NTRS)

Bhaduri, Kanishka; Zhu, Qiang; Oza, Nikunj C.; Srivastava, Ashok N.

2010-01-01

Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual
The CMS DBS query language

NASA Astrophysics Data System (ADS)

Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

2010-04-01

The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.
Fixing Dataset Search

NASA Technical Reports Server (NTRS)

Lynnes, Chris

2014-01-01

Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.
GO2PUB: Querying PubMed with semantic expansion of gene ontology terms

PubMed Central

2012-01-01

Background With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants. Results GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts’ agreement was high (kappa = 0.88). GO2PUB returned 69% of the relevant articles, GoPubMed: 40% and PubMed: 29%. GO2PUB and GoPubMed have 17% of their results in common, corresponding to 24% of the total number of relevant results. 70% of the articles returned by more than one tool were relevant. 36% of the relevant articles were returned only by GO2PUB, 17% only by GoPubMed and 14% only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77% and 40% for the first queries, and of 70% and 38% for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances
A model for the determination of pollen count using google search queries for patients suffering from allergic rhinitis.

PubMed

König, Volker; Mösges, Ralph

2014-01-01

Background. The transregional increase in pollen-associated allergies and their diversity have been scientifically proven. However, patchy pollen count measurement in many regions is a worldwide problem with few exceptions. Methods. This paper used data gathered from pollen count stations in Germany, Google queries using relevant allergological/biological keywords, and patient data from three German study centres collected in a prospective, double-blind, randomised, placebo-controlled, multicentre immunotherapy study to analyse a possible correlation between these data pools. Results. Overall, correlations between the patient-based, combined symptom medication score and Google data were stronger than those with the regionally measured pollen count data. The correlation of the Google data was especially strong in the groups of severe allergy sufferers. The results of the three-centre analyses show moderate to strong correlations with the Google keywords (up to >0.8 cross-correlation coefficient, P < 0.001) in 10 out of 11 groups (three averaged patient cohorts and eight subgroups of severe allergy sufferers: high IgE class, high combined symptom medication score, and asthma). Conclusion. For countries with a good Internet infrastructure but no dense network of pollen traps, this could represent an alternative for determining pollen levels and, forecasting the pollen count for the next day.
An International Asteroid Search Campaign: Internet-Based Hands-On Research Program for High Schools and Colleges, in Collaboration with the Hands-On Universe Project

ERIC Educational Resources Information Center

Miller, J. Patrick; Davis, Jeffrey W.; Holmes, Robert E., Jr.; Devore, Harlan; Raab, Herbert; Pennypacker, Carlton R.; White, Graeme L.; Gould, Alan

2008-01-01

The International Asteroid Search Campaign (IASC, fondly nicknamed "Isaac") is an Internet-based program for high schools and colleges. Within hours of acquisition, astronomical CCD images are made available via the Internet to participating schools around the world. Under the guidance of their teachers, students analyze the images with free…
BOSS: context-enhanced search for biomedical objects

PubMed Central

2012-01-01

Background There exist many academic search solutions and most of them can be put on either ends of spectrum: general-purpose search and domain-specific "deep" search systems. The general-purpose search systems, such as PubMed, offer flexible query interface, but churn out a list of matching documents that users have to go through the results in order to find the answers to their queries. On the other hand, the "deep" search systems, such as PPI Finder and iHOP, return the precompiled results in a structured way. Their results, however, are often found only within some predefined contexts. In order to alleviate these problems, we introduce a new search engine, BOSS, Biomedical Object Search System. Methods Unlike the conventional search systems, BOSS indexes segments, rather than documents. A segment refers to a Maximal Coherent Semantic Unit (MCSU) such as phrase, clause or sentence that is semantically coherent in the given context (e.g., biomedical objects or their relations). For a user query, BOSS finds all matching segments, identifies the objects appearing in those segments, and aggregates the segments for each object. Finally, it returns the ranked list of the objects along with their matching segments. Results The working prototype of BOSS is available at http://boss.korea.ac.kr. The current version of BOSS has indexed abstracts of more than 20 million articles published during last 16 years from 1996 to 2011 across all science disciplines. Conclusion BOSS fills the gap between either ends of the spectrum by allowing users to pose context-free queries and by returning a structured set of results. Furthermore, BOSS exhibits the characteristic of good scalability, just as with conventional document search engines, because it is designed to use a standard document-indexing model with minimal modifications. Considering the features, BOSS notches up the technological level of traditional solutions for search on biomedical information. PMID:22595092
LETTER TO THE EDITOR: Optimization of partial search

NASA Astrophysics Data System (ADS)

Korepin, Vladimir E.

2005-11-01

A quantum Grover search algorithm can find a target item in a database faster than any classical algorithm. One can trade accuracy for speed and find a part of the database (a block) containing the target item even faster; this is partial search. A partial search algorithm was recently suggested by Grover and Radhakrishnan. Here we optimize it. Efficiency of the search algorithm is measured by the number of queries to the oracle. The author suggests a new version of the Grover-Radhakrishnan algorithm which uses a minimal number of such queries. The algorithm can run on the same hardware that is used for the usual Grover algorithm.

CUFID-query: accurate network querying through random walk based network flow estimation.

PubMed

Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

2017-12-28

Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. Through extensive
Internet Hospitals in China: Cross-Sectional Survey

PubMed Central

Lin, Lingyan; Fan, Si; Lin, Fen; Wang, Long; Guo, Tongjun; Ma, Chuyang; Zhang, Jingkun; Chen, Yixin

2017-01-01

Background The Internet hospital, an innovative approach to providing health care, is rapidly developing in China because it has the potential to provide widely accessible outpatient service delivery via Internet technologies. To date, China’s Internet hospitals have not been systematically investigated. Objective The aim of this study was to describe the characteristics of China’s Internet hospitals, and to assess their health service capacity. Methods We searched Baidu, the popular Chinese search engine, to identify Internet hospitals, using search terms such as “Internet hospital,” “web hospital,” or “cloud hospital.” All Internet hospitals in mainland China were eligible for inclusion if they were officially registered. Our search was carried out until March 31, 2017. Results We identified 68 Internet hospitals, of which 43 have been put into use and 25 were under construction. Of the 43 established Internet hospitals, 13 (30%) were in the hospital informatization stage, 24 (56%) were in the Web ward stage, and 6 (14%) were in full Internet hospital stage. Patients accessed outpatient service delivery via website (74%, 32/43), app (42%, 18/43), or offline medical consultation facility (37%, 16/43) from the Internet hospital. Furthermore, 25 (58%) of the Internet hospitals asked doctors to deliver health services at a specific Web clinic, whereas 18 (42%) did not. The consulting methods included video chat (60%, 26/43), telephone (19%, 8/43), and graphic message (28%, 12/43); 13 (30%) Internet hospitals cannot be consulted online any more. Only 6 Internet hospitals were included in the coverage of health insurance. The median number of doctors available online was zero (interquartile range [IQR] 0 to 5; max 16,492). The median consultation fee per time was ¥20 (approximately US $2.90, IQR ¥0 to ¥200). Conclusions Internet hospitals provide convenient outpatient service delivery. However, many of the Internet hospitals are not yet mature and
Using the internet to understand angler behavior in the information age

USGS Publications Warehouse

Martin, Dustin R.; Pracheil, Brenda M.; DeBoer, Jason A.; Wilde, Gene R.; Pope, Kevin L.

2012-01-01

Declining participation in recreational angling is of great concern to fishery managers because fishing license sales are an important revenue source for protection of aquatic resources. This decline is frequently attributed, in part, to increased societal reliance on electronics. Internet use by anglers is increasing and fishery managers may use the Internet as a unique means to increase angler participation. We examined Internet search behavior using Google Insights for Search, a free online tool that summarizes Google searches from 2004 to 2011 to determine (1) trends in Internet search volume for general fishing related terms and (2) the relative usefulness of terms related to angler recruitment programs across the United States. Though search volume declined for general fishing terms (e.g., fishing, fishing guide), search volume increased for social media and recruitment terms (e.g., fishing forum, family fishing) over the 7-year period. We encourage coordinators of recruitment programs to capitalize on anglers’ Internet usage by considering Internet search patterns when creating web-based information. Careful selection of terms used in web-based information to match those currently searched by potential anglers may help to direct traffic to state agency websites that support recruitment efforts.
PlateRunner: A Search Engine to Identify EMR Boilerplates.

PubMed

Divita, Guy; Workman, T Elizabeth; Carter, Marjorie E; Redd, Andrew; Samore, Matthew H; Gundlapalli, Adi V

2016-01-01

Medical text contains boilerplated content, an artifact of pull-down forms from EMRs. Boilerplated content is the source of challenges for concept extraction on clinical text. This paper introduces PlateRunner, a search engine on boilerplates from the US Department of Veterans Affairs (VA) EMR. Boilerplates containing concepts should be identified and reviewed to recognize challenging formats, identify high yield document titles, and fine tune section zoning. This search engine has the capability to filter negated and asserted concepts, save and search query results. This tool can save queries, search results, and documents found for later analysis.
Knowledge Query Language (KQL)

DTIC Science & Technology

2016-02-12

Lexington Massachusetts This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation...independent of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions
Constructing Topic Models of Internet of Things for Information Processing

PubMed Central

Xin, Jie; Cui, Zhiming; Zhang, Shukui; He, Tianxu; Li, Chunhua; Huang, Haojing

2014-01-01

Internet of Things (IoT) is regarded as a remarkable development of the modern information technology. There is abundant digital products data on the IoT, linking with multiple types of objects/entities. Those associated entities carry rich information and usually in the form of query records. Therefore, constructing high quality topic hierarchies that can capture the term distribution of each product record enables us to better understand users' search intent and benefits tasks such as taxonomy construction, recommendation systems, and other communications solutions for the future IoT. In this paper, we propose a novel record entity topic model (RETM) for IoT environment that is associated with a set of entities and records and a Gibbs sampling-based algorithm is proposed to learn the model. We conduct extensive experiments on real-world datasets and compare our approach with existing methods to demonstrate the advantage of our approach. PMID:25110737
Constructing topic models of Internet of Things for information processing.

PubMed

Xin, Jie; Cui, Zhiming; Zhang, Shukui; He, Tianxu; Li, Chunhua; Huang, Haojing

2014-01-01

Internet of Things (IoT) is regarded as a remarkable development of the modern information technology. There is abundant digital products data on the IoT, linking with multiple types of objects/entities. Those associated entities carry rich information and usually in the form of query records. Therefore, constructing high quality topic hierarchies that can capture the term distribution of each product record enables us to better understand users' search intent and benefits tasks such as taxonomy construction, recommendation systems, and other communications solutions for the future IoT. In this paper, we propose a novel record entity topic model (RETM) for IoT environment that is associated with a set of entities and records and a Gibbs sampling-based algorithm is proposed to learn the model. We conduct extensive experiments on real-world datasets and compare our approach with existing methods to demonstrate the advantage of our approach.
A web search on environmental topics: what is the role of ranking?

PubMed

Covolo, Loredana; Filisetti, Barbara; Mascaretti, Silvia; Limina, Rosa Maria; Gelatti, Umberto

2013-12-01

Although the Internet is easy to use, the mechanisms and logic behind a Web search are often unknown. Reliable information can be obtained, but it may not be visible as the Web site is not located in the first positions of search results. The possible risks of adverse health effects arising from environmental hazards are issues of increasing public interest, and therefore the information about these risks, particularly on topics for which there is no scientific evidence, is very crucial. The aim of this study was to investigate whether the presentation of information on some environmental health topics differed among various search engines, assuming that the most reliable information should come from institutional Web sites. Five search engines were used: Google, Yahoo!, Bing, Ask, and AOL. The following topics were searched in combination with the word "health": "nuclear energy," "electromagnetic waves," "air pollution," "waste," and "radon." For each topic three key words were used. The first 30 search results for each query were considered. The ranking variability among the search engines and the type of search results were analyzed for each topic and for each key word. The ranking of institutional Web sites was given particular consideration. Variable results were obtained when surfing the Internet on different environmental health topics. Multivariate logistic regression analysis showed that, when searching for radon and air pollution topics, it is more likely to find institutional Web sites in the first 10 positions compared with nuclear power (odds ratio=3.4, 95% confidence interval 2.1-5.4 and odds ratio=2.9, 95% confidence interval 1.8-4.7, respectively) and also when using Google compared with Bing (odds ratio=3.1, 95% confidence interval 1.9-5.1). The increasing use of online information could play an important role in forming opinions. Web users should become more aware of the importance of finding reliable information, and health institutions should be
CellAtlasSearch: a scalable search engine for single cells.

PubMed

Srivastava, Divyanshu; Iyer, Arvind; Kumar, Vibhor; Sengupta, Debarka

2018-05-21

Owing to the advent of high throughput single cell transcriptomics, past few years have seen exponential growth in production of gene expression data. Recently efforts have been made by various research groups to homogenize and store single cell expression from a large number of studies. The true value of this ever increasing data deluge can be unlocked by making it searchable. To this end, we propose CellAtlasSearch, a novel search architecture for high dimensional expression data, which is massively parallel as well as light-weight, thus infinitely scalable. In CellAtlasSearch, we use a Graphical Processing Unit (GPU) friendly version of Locality Sensitive Hashing (LSH) for unmatched speedup in data processing and query. Currently, CellAtlasSearch features over 300 000 reference expression profiles including both bulk and single-cell data. It enables the user query individual single cell transcriptomes and finds matching samples from the database along with necessary meta information. CellAtlasSearch aims to assist researchers and clinicians in characterizing unannotated single cells. It also facilitates noise free, low dimensional representation of single-cell expression profiles by projecting them on a wide variety of reference samples. The web-server is accessible at: http://www.cellatlassearch.com.
Aggregating Queries Against Large Inventories of Remotely Accessible Data

NASA Astrophysics Data System (ADS)

Gallagher, J. H. R.; Fulker, D. W.

2016-12-01

Those seeking to discover data for a specific purpose often encounter search results that are so large as to be useless without computing assistance. This situation arises, with increasing frequency, in part because repositories contain ever greater numbers of granules, and their granularities may well be poorly aligned or even orthogonal to the data-selection needs of the user. This presentation describes a recently developed service for simultaneously querying large lists of OPeNDAP-accessible granules to extract specified data. The specifications include a richly expressive set of data-selection criteria—applicable to content as well as metadata—and the service has been tested successfully against lists naming hundreds of thousands of granules. Querying such numbers of local files (i.e., granules) on a desktop or laptop computer is practical (by using a scripting language, e.g.), but this practicality is diminished when the data are remote and thus best accessed through a Web-services interface. In these cases, which are increasingly common, scripted queries can take many hours because of inherent network latencies. Furthermore, communication dropouts can add fragility to such scripts, yielding gaps in the acquired results. In contrast, OPeNDAP's new aggregated-query services enable data discovery in the context of very large inventory sizes. These capabilities have been developed for use with OPeNDAP's Hyrax server, which is an open-source realization of DAP (for "Data Access Protocol," a specification widely used in NASA, NOAA and other data-intensive contexts). These aggregated-query services exhibit good response times (on the order of seconds, not hours) even for inventories that list hundreds of thousands of source granules.
Heuristic query optimization for query multiple table and multiple clausa on mobile finance application

NASA Astrophysics Data System (ADS)

Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG

2018-01-01

Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.
Occam's razor: supporting visual query expression for content-based image queries

NASA Astrophysics Data System (ADS)

Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

2005-01-01

This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).
T253. THE CORRELATION ANALYSIS BETWEEN RENAMING SCHIZOPHRENIA AND VISITING FREQUENCY OF MENTAL HEALTH SERVICES BY BIG DATA ANALYSIS (INTERNET SEARCHES AND NEWSPAPER ARTICLES) IN SOUTH KOREA

PubMed Central

Lee, Sang Yup; Hong, Kyung Sue; Joo, Yeon Ho; Koike, Shinsuke; Lee, Yu Sang; Kwon, Jun Soo

2018-01-01

Abstract Background Korean Neuropsychiatric Association changed the Korean term for schizophrenia from ‘split-mind disorder’ to ‘attunement disorder’ in 2012, to dispel the stigma associated with name, and to promote early detection and treatment. Information on the internet affects the public awareness and attitude toward schizophrenia. The main purpose of this study was to investigate the correlation between renaming schizophrenia and the pattern of mental health services utilization by big data analysis of internet (newspaper articles and internet searches) in Korea. Methods From January 2016 to September 2017, newspaper articles on “attunement disorder” and “split-mind disorder” available on the internet were classified as related with negative images like crime and helpful or positive in dispelling the stigma. The relationship between the number of anti-stigma newspaper articles and newspaper articles of schizophrenia containing both positive and negative images was examined. In addition, using Naver, a major internet search engine in Korea, we investigated the total number of internet searches of both old and new name of schizophrenia by gender differences. Finally, the frequency of the visits of mental health services of patients with schizophrenia was measured using the Korean Healthcare Bigdata Hub (http://opendata.hira.or.kr/home.do#none) for 14 months and the correlation between the frequency of the visits and the above big data was examined. The data were analyzed using the SPSS/WIN 24.0. Pearson correlation coefficients were used to analyze correlations. Results The amounts of newspaper articles containing anti-stigma of schizophrenia were correlated with the amounts of newspaper articles containing negative images like crime of the new name (attunement disorder) of schizophrenia (r=0.528, p<0.01), which was greater than the amounts of newspaper articles containing the old name (split-mind disorder) of schizophrenia (r=0.300, p<0
Access to care and use of the Internet to search for health information: results from the US National Health Interview Survey.

PubMed

Amante, Daniel J; Hogan, Timothy P; Pagoto, Sherry L; English, Thomas M; Lapane, Kate L

2015-04-29

The insurance mandate of the Affordable Care Act has increased the number of people with health coverage in the United States. There is speculation that this increase in the number of insured could make accessing health care services more difficult. Those who are unable to access care in a timely manner may use the Internet to search for information needed to answer their health questions. The aim was to determine whether difficulty accessing health care services for reasons unrelated to insurance coverage is associated with increased use of the Internet to obtain health information. Survey data from 32,139 adults in the 2011 National Health Interview Study (NHIS) were used in this study. The exposure for this analysis was reporting difficulty accessing health care services or delaying getting care for a reason unrelated to insurance status. To define this exposure, we examined 8 questions that asked whether different access problems occurred during the previous 12 months. The outcome for this analysis, health information technology (HIT) use, was captured by examining 2 questions that asked survey respondents if they used an online health chat room or searched the Internet to obtain health information in the previous 12 months. Several multinomial logistic regressions estimating the odds of using HIT for each reported access difficulty were conducted to accomplish the study objective. Of a survey population of 32,139 adults, more than 15.90% (n=5109) reported experiencing at least one access to care barrier, whereas 3.63% (1168/32,139) reported using online health chat rooms and 43.55% (13,997/32,139) reported searching the Internet for health information. Adults who reported difficulty accessing health care services for reasons unrelated to their health insurance coverage had greater odds of using the Internet to obtain health information. Those who reported delaying getting care because they could not get an appointment soon enough (OR 2.2, 95% CI 1.9-2.5), were
Guiding users to quality information about osteoarthritis on the Internet: a pilot study.

PubMed

Ilic, Dragan; Maloney, Stephen; Green, Sally

2005-12-01

This pilot study explored the feasibility of and user satisfaction with an Internet User's Guide (IUG) to assist patients in sourcing relevant, valid information about osteoarthritis on the Internet. Twelve people with osteoarthritis participated in focus groups that involved searching the Internet for information relating to their condition with the aid of the IUG. Participants were asked to perform an initial search of the Internet for information on osteoarthritis, followed by a second search with the aid of the IUG. User satisfaction with the IUG and subsequent online searches was obtained during and following the Internet simulations. A total of 92% of all participants had previously used the Internet to search for health information in the past. However, only a third used the Internet to further source information on their condition. Prior to using the IUG, participants cited efficiently searching the Internet for relevant and credible information as the primary obstacle in their continued use of the Internet. All participants reported that the use of the IUG increased their ability to source quality online medical information. The provision of an IUG may support and increase user awareness about searching for relevant, quality medical information on the Internet. Further quantitative and qualitative research is required to identify how best to empower consumers who wish to use the Internet as a medical resource.
Anamneses-Based Internet Information Supply: Can a Combination of an Expert System and Meta-Search Engine Help Consumers find the Health Information they Require?

PubMed Central

Honekamp, Wilfried; Ostermann, Herwig

2010-01-01

An increasing number of people search for health information online. During the last 10 years various researchers have determined the requirements for an ideal consumer health information system. The aim of this study was to figure out, whether medical laymen can find a more accurate diagnosis for a given anamnesis via the developed prototype health information system than via ordinary internet search. In a randomized controlled trial, the prototype information system was evaluated by the assessment of two sample cases. Participants had to determine the diagnosis of a patient with a headache via information found searching the web. A patient’s history sheet and a computer with internet access were provided to the participants and they were guided through the study by an especially designed study website. The intervention group used the prototype information system; the control group used common search engines and portals. The numbers of correct diagnoses in each group were compared. A total of 140 (60/80) participants took part in two study sections. In the first case, which determined a common diagnosis, both groups did equally well. In the second section, which determined a less common and more complex case, the intervention group did significantly better (P=0.031) due to the tailored information supply. Using medical expert systems in combination with a portal searching meta-search engine represents a feasible strategy to provide reliable patient-tailored information and can ultimately contribute to patient safety with respect to information found via the internet. PMID:20502597
Multi-Bit Quantum Private Query

NASA Astrophysics Data System (ADS)

Shi, Wei-Xu; Liu, Xing-Tong; Wang, Jian; Tang, Chao-Jing

2015-09-01

Most of the existing Quantum Private Queries (QPQ) protocols provide only single-bit queries service, thus have to be repeated several times when more bits are retrieved. Wei et al.'s scheme for block queries requires a high-dimension quantum key distribution system to sustain, which is still restricted in the laboratory. Here, based on Markus Jakobi et al.'s single-bit QPQ protocol, we propose a multi-bit quantum private query protocol, in which the user can get access to several bits within one single query. We also extend the proposed protocol to block queries, using a binary matrix to guard database security. Analysis in this paper shows that our protocol has better communication complexity, implementability and can achieve a considerable level of security.
Knowledge Query Language (KQL)

DTIC Science & Technology

2016-02-01

unlimited. This page intentionally left blank. iii EXECUTIVE SUMMARY Currently, queries for data ...retrieval from non-Structured Query Language (NoSQL) data stores are tightly coupled to the specific implementation of the data store implementation, making...of the storage content and format for querying NoSQL or relational data stores. This approach uses address expressions (or A-Expressions) embedded in
Evidential significance of automotive paint trace evidence using a pattern recognition based infrared library search engine for the Paint Data Query Forensic Database.

PubMed

Lavine, Barry K; White, Collin G; Allen, Matthew D; Fasasi, Ayuba; Weakley, Andrew

2016-10-01

A prototype library search engine has been further developed to search the infrared spectral libraries of the paint data query database to identify the line and model of a vehicle from the clear coat, surfacer-primer, and e-coat layers of an intact paint chip. For this study, search prefilters were developed from 1181 automotive paint systems spanning 3 manufacturers: General Motors, Chrysler, and Ford. The best match between each unknown and the spectra in the hit list generated by the search prefilters was identified using a cross-correlation library search algorithm that performed both a forward and backward search. In the forward search, spectra were divided into intervals and further subdivided into windows (which corresponds to the time lag for the comparison) within those intervals. The top five hits identified in each search window were compiled; a histogram was computed that summarized the frequency of occurrence for each library sample, with the IR spectra most similar to the unknown flagged. The backward search computed the frequency and occurrence of each line and model without regard to the identity of the individual spectra. Only those lines and models with a frequency of occurrence greater than or equal to 20% were included in the final hit list. If there was agreement between the forward and backward search results, the specific line and model common to both hit lists was always the correct assignment. Samples assigned to the same line and model by both searches are always well represented in the library and correlate well on an individual basis to specific library samples. For these samples, one can have confidence in the accuracy of the match. This was not the case for the results obtained using commercial library search algorithms, as the hit quality index scores for the top twenty hits were always greater than 99%. Copyright © 2016 Elsevier B.V. All rights reserved.
Sensitivity and predictive value of 15 PubMed search strategies to answer clinical questions rated against full systematic reviews.

PubMed

Agoritsas, Thomas; Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

2012-06-12

Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed's Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%-25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%-30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help clinicians apply effective strategies to answer their

Query-based learning for aerospace applications.

PubMed

Saad, E W; Choi, J J; Vian, J L; Wunsch, D C Ii

2003-01-01

Models of real-world applications often include a large number of parameters with a wide dynamic range, which contributes to the difficulties of neural network training. Creating the training data set for such applications becomes costly, if not impossible. In order to overcome the challenge, one can employ an active learning technique known as query-based learning (QBL) to add performance-critical data to the training set during the learning phase, thereby efficiently improving the overall learning/generalization. The performance-critical data can be obtained using an inverse mapping called network inversion (discrete network inversion and continuous network inversion) followed by oracle query. This paper investigates the use of both inversion techniques for QBL learning, and introduces an original heuristic to select the inversion target values for continuous network inversion method. Efficiency and generalization was further enhanced by employing node decoupled extended Kalman filter (NDEKF) training and a causality index (CI) as a means to reduce the input search dimensionality. The benefits of the overall QBL approach are experimentally demonstrated in two aerospace applications: a classification problem with large input space and a control distribution problem.
Internet Protocol Handbook. Volume 4. The Domain Name System (DNS) handbook

DTIC Science & Technology

1989-08-01

Mockapetris [Page 1] 4-11 INTERNET PROTOCOL HA TDBOOK - Voue Four 1989 RFC 1034 Domain Concepts and Facilities November 1987 bandwidth consumed in distributing...Domain Names- Concepts and Facilities KFC 1034 RFC 1034 Domain Concepts and Facilities November 1’)87 - Queries contain a bit called recursion desired...during periodic sweeps to reclaim the memory consumed by old RRS. Mockapetris [Page 33] 4-43 INTERNET PROTOCOL HANDBOOK - Volume Four 1989 RFC 1034
Problematic Internet Use among Turkish University Students: A Multidimensional Investigation Based on Demographics and Internet Activities

ERIC Educational Resources Information Center

Tekinarslan, Erkan; Gurer, Melih Derya

2011-01-01

This study investigated the Turkish undergraduate university students' problematic Internet use (PIU) levels on different dimensions based on demographics (e.g., gender, Internet use by time of day), and Internet activities (e.g., chat, entertainment, social networking, information searching, etc.). Moreover, the study explored some predictors of…
Distributed Efficient Similarity Search Mechanism in Wireless Sensor Networks

PubMed Central

Ahmed, Khandakar; Gregory, Mark A.

2015-01-01

The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS) schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS) is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of similarity searching into the problem of an interval search in one dimension. In addition, a sector based distance routing algorithm is used to efficiently route messages. Extensive simulation results reveal that DCSMSS is highly efficient and significantly outperforms previous approaches in processing similarity search queries. PMID:25751081
Internet Hospitals in China: Cross-Sectional Survey.

PubMed

Xie, Xiaoxu; Zhou, Weimin; Lin, Lingyan; Fan, Si; Lin, Fen; Wang, Long; Guo, Tongjun; Ma, Chuyang; Zhang, Jingkun; He, Yuan; Chen, Yixin

2017-07-04

The Internet hospital, an innovative approach to providing health care, is rapidly developing in China because it has the potential to provide widely accessible outpatient service delivery via Internet technologies. To date, China's Internet hospitals have not been systematically investigated. The aim of this study was to describe the characteristics of China's Internet hospitals, and to assess their health service capacity. We searched Baidu, the popular Chinese search engine, to identify Internet hospitals, using search terms such as "Internet hospital," "web hospital," or "cloud hospital." All Internet hospitals in mainland China were eligible for inclusion if they were officially registered. Our search was carried out until March 31, 2017. We identified 68 Internet hospitals, of which 43 have been put into use and 25 were under construction. Of the 43 established Internet hospitals, 13 (30%) were in the hospital informatization stage, 24 (56%) were in the Web ward stage, and 6 (14%) were in full Internet hospital stage. Patients accessed outpatient service delivery via website (74%, 32/43), app (42%, 18/43), or offline medical consultation facility (37%, 16/43) from the Internet hospital. Furthermore, 25 (58%) of the Internet hospitals asked doctors to deliver health services at a specific Web clinic, whereas 18 (42%) did not. The consulting methods included video chat (60%, 26/43), telephone (19%, 8/43), and graphic message (28%, 12/43); 13 (30%) Internet hospitals cannot be consulted online any more. Only 6 Internet hospitals were included in the coverage of health insurance. The median number of doctors available online was zero (interquartile range [IQR] 0 to 5; max 16,492). The median consultation fee per time was ¥20 (approximately US $2.90, IQR ¥0 to ¥200). Internet hospitals provide convenient outpatient service delivery. However, many of the Internet hospitals are not yet mature and are faced with various issues such as online doctor scarcity and
A Survey in Indexing and Searching XML Documents.

ERIC Educational Resources Information Center

Luk, Robert W. P.; Leong, H. V.; Dillon, Tharam S.; Chan, Alvin T. S.; Croft, W. Bruce; Allan, James

2002-01-01

Discussion of XML focuses on indexing techniques for XML documents, grouping them into flat-file, semistructured, and structured indexing paradigms. Highlights include searching techniques, including full text search and multistage search; search result presentations; database and information retrieval system integration; XML query languages; and…
Seeking Insights About Cycling Mood Disorders via Anonymized Search Logs

PubMed Central

White, Ryen W; Horvitz, Eric

2014-01-01

Background Mood disorders affect a significant portion of the general population. Cycling mood disorders are characterized by intermittent episodes (or events) of the disease. Objective Using anonymized Web search logs, we identify a population of people with significant interest in mood stabilizing drugs (MSD) and seek evidence of mood swings in this population. Methods We extracted queries to the Microsoft Bing search engine made by 20,046 Web searchers over six months, separately explored searcher demographics using data from a large external panel of users, and sought supporting information from people with mood disorders via a survey. We analyzed changes in information needs over time relative to searches on MSD. Results Queries for MSD focused on side effects and their relation to the disease. We found evidence of significant changes in search behavior and interests coinciding with days that MSD queries are made. These include large increases (>100%) in the access of nutrition information, commercial information, and adult materials. A survey of patients diagnosed with mood disorders provided evidence that repeated queries on MSD may come with exacerbations of mood disorder. A classifier predicting the occurrence of such queries one day before they are observed obtains strong performance (AUC=0.78). Conclusions Observed patterns in search behavior align with known behaviors and those highlighted by survey respondents. These observations suggest that searchers showing intensive interest in MSD may be patients who have been prescribed these drugs. Given behavioral dynamics, we surmise that the days on which MSD queries are made may coincide with commencement of mania or depression. Although we do not have data on mood changes and whether users have been diagnosed with bipolar illness, we see evidence of cycling in people who show interest in MSD and further show that we can predict impending shifts in behavior and interest. PMID:24568936
Relevance of Google-customized search engine vs. CISMeF quality-controlled health gateway.

PubMed

Gehanno, Jean-François; Kerdelhue, Gaétan; Sakji, Saoussen; Massari, Philippe; Joubert, Michel; Darmoni, Stéfan J

2009-01-01

CISMeF (acronym for Catalog and Index of French Language Health Resources on the Internet) is a quality-controlled health gateway conceived to catalog and index the most important and quality-controlled sources of institutional health information in French. The goal of this study is to compare the relevance of results provided by this gateway from a small set of documents selected and described by human experts to those provided by a search engine from a large set of automatically indexed and ranked resources. The Google-Customized search engine (CSE) was used. The evaluation was made using the 10th first results of 15 queries and two blinded physician evaluators. There was no significant difference between the relevance of information retrieval in CISMeF and Google CSE. In conclusion, automatic indexing does not lead to lower relevance than a manual MeSH indexing and may help to cope with the increasing number of references to be indexed in a controlled health quality gateway.
Essie: A Concept-based Search Engine for Structured Biomedical Text

PubMed Central

Ide, Nicholas C.; Loane, Russell F.; Demner-Fushman, Dina

2007-01-01

This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie’s design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie’s performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain. PMID:17329729
Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases

NASA Astrophysics Data System (ADS)

Chen, Yangjun

Since the extensible markup language XML emerged as a new standard for information representation and exchange on the Internet, the problem of storing, indexing, and querying XML documents has been among the major issues of database research. In this paper, we study the twig pattern matching and discuss a new algorithm for processing ordered twig pattern queries. The time complexity of the algorithmis bounded by O(|D|·|Q| + |T|·leaf Q ) and its space overhead is by O(leaf T ·leaf Q ), where T stands for a document tree, Q for a twig pattern and D is a largest data stream associated with a node q of Q, which contains the database nodes that match the node predicate at q. leaf T (leaf Q ) represents the number of the leaf nodes of T (resp. Q). In addition, the algorithm can be adapted to an indexing environment with XB-trees being used.
Method and system for efficiently searching an encoded vector index

DOEpatents

Bui, Thuan Quang; Egan, Randy Lynn; Kathmann, Kevin James

2001-09-04

Method and system aspects for efficiently searching an encoded vector index are provided. The aspects include the translation of a search query into a candidate bitmap, and the mapping of data from the candidate bitmap into a search result bitmap according to entry values in the encoded vector index. Further, the translation includes the setting of a bit in the candidate bitmap for each entry in a symbol table that corresponds to candidate of the search query. Also included in the mapping is the identification of a bit value in the candidate bitmap pointed to by an entry in an encoded vector.
The contribution of morphological knowledge to French MeSH mapping for information retrieval.

PubMed Central

Zweigenbaum, P.; Darmoni, S. J.; Grabar, N.

2001-01-01

MeSH-indexed Internet health directories must provide a mapping from natural language queries to MeSH terms so that both health professionals and the general public can query their contents. We describe here the design of lexical knowledge bases for mapping French expressions to MeSH terms, and the initial evaluation of their contribution to Doc'CISMeF, the search tool of a MeSH-indexed directory of French-language medical Internet resources. The observed trend is in favor of the use of morphological knowledge as a moderate (approximately 5%) but effective factor for improving query to term mapping capabilities. PMID:11825295
Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples

PubMed Central

Wilks, Christopher; Gaddipati, Phani; Nellore, Abhinav

2018-01-01

Abstract Motivation As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. These enable researchers to leverage vast datasets that would otherwise be difficult to obtain. Results Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70 000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can score junctions according to tissue specificity or other criteria, and can score samples according to the relative frequency of different splicing patterns. We describe the software and outline biological questions that can be explored with Snaptron queries. Availability and implementation Documentation is at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron and https://github.com/ChristopherWilks/snaptron-experiments with a CC BY-NC 4.0 license. Contact chris.wilks@jhu.edu or langmea@cs.jhu.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28968689
Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples.

PubMed

Wilks, Christopher; Gaddipati, Phani; Nellore, Abhinav; Langmead, Ben

2018-01-01

As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. These enable researchers to leverage vast datasets that would otherwise be difficult to obtain. Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70 000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can score junctions according to tissue specificity or other criteria, and can score samples according to the relative frequency of different splicing patterns. We describe the software and outline biological questions that can be explored with Snaptron queries. Documentation is at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron and https://github.com/ChristopherWilks/snaptron-experiments with a CC BY-NC 4.0 license. chris.wilks@jhu.edu or langmea@cs.jhu.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
The Weaknesses of Full-Text Searching

ERIC Educational Resources Information Center

Beall, Jeffrey

2008-01-01

This paper provides a theoretical critique of the deficiencies of full-text searching in academic library databases. Because full-text searching relies on matching words in a search query with words in online resources, it is an inefficient method of finding information in a database. This matching fails to retrieve synonyms, and it also retrieves…
Gender Differences in Searching for Health Information on the Internet and the Virtual Patient-Physician Relationship in Germany: Exploratory Results on How Men and Women Differ and Why.

PubMed

Bidmon, Sonja; Terlutter, Ralf

2015-06-22

Many studies have shown that women use the Internet more often for health-related information searches than men, but we have limited knowledge about the underlying reasons. We also do not know whether and how women and men differ in their current use of the Internet for communicating with their general practitioner (GP) and in their future intention to do so (virtual patient-physician relationship). This study investigates (1) gender differences in health-related information search behavior by exploring underlying emotional, motivational, attitudinal as well as cognitive variables, situational involvement, and normative influences, and different personal involvement regarding health-related information searching and (2) gender differences in the virtual patient-physician relationship. Gender differences were analyzed based on an empirical online survey of 1006 randomly selected German patients. The sample was drawn from an e-panel maintained by GfK HealthCare. A total of 958 usable questionnaires were analyzed. Principal component analyses were carried out for some variables. Differences between men (517/958) and women (441/958) were analyzed using t tests and Kendall's tau-b tests. The survey instrument was guided by several research questions and was based on existing literature. Women were more engaged in using the Internet for health-related information searching. Gender differences were found for the frequency of usage of various Internet channels for health-related information searches. Women used the Internet for health-related information searches to a higher degree for social motives and enjoyment and they judged the usability of the Internet medium and of the information gained by health information searches higher than men did. Women had a more positive attitude toward Web 2.0 than men did, but perceived themselves as less digitally competent. Women had a higher health and nutrition awareness and a greater reluctance to make use of medical support, as
Occam"s razor: supporting visual query expression for content-based image queries

NASA Astrophysics Data System (ADS)

Venters, Colin C.; Hartley, Richard J.; Hewitt, William T.

2004-12-01

This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).
Privacy-preserving search for chemical compound databases.

PubMed

Shimizu, Kana; Nuida, Koji; Arai, Hiromi; Mitsunari, Shigeo; Attrapadung, Nuttapong; Hamada, Michiaki; Tsuda, Koji; Hirokawa, Takatsugu; Sakuma, Jun; Hanaoka, Goichiro; Asai, Kiyoshi

2015-01-01

Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.
Privacy-preserving search for chemical compound databases

PubMed Central

2015-01-01

Background Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information. PMID:26678650
Integrating unified medical language system and association mining techniques into relevance feedback for biomedical literature search.

PubMed

Ji, Yanqing; Ying, Hao; Tran, John; Dews, Peter; Massanari, R Michael

2016-07-19

Finding highly relevant articles from biomedical databases is challenging not only because it is often difficult to accurately express a user's underlying intention through keywords but also because a keyword-based query normally returns a long list of hits with many citations being unwanted by the user. This paper proposes a novel biomedical literature search system, called BiomedSearch, which supports complex queries and relevance feedback. The system employed association mining techniques to build a k-profile representing a user's relevance feedback. More specifically, we developed a weighted interest measure and an association mining algorithm to find the strength of association between a query and each concept in the article(s) selected by the user as feedback. The top concepts were utilized to form a k-profile used for the next-round search. BiomedSearch relies on Unified Medical Language System (UMLS) knowledge sources to map text files to standard biomedical concepts. It was designed to support queries with any levels of complexity. A prototype of BiomedSearch software was made and it was preliminarily evaluated using the Genomics data from TREC (Text Retrieval Conference) 2006 Genomics Track. Initial experiment results indicated that BiomedSearch increased the mean average precision (MAP) for a set of queries. With UMLS and association mining techniques, BiomedSearch can effectively utilize users' relevance feedback to improve the performance of biomedical literature search.

A novel visualization model for web search results.

PubMed

Nguyen, Tien N; Zhang, Jin

2006-01-01

This paper presents an interactive visualization system, named WebSearchViz, for visualizing the Web search results and acilitating users' navigation and exploration. The metaphor in our model is the solar system with its planets and asteroids revolving around the sun. Location, color, movement, and spatial distance of objects in the visual space are used to represent the semantic relationships between a query and relevant Web pages. Especially, the movement of objects and their speeds add a new dimension to the visual space, illustrating the degree of relevance among a query and Web search results in the context of users' subjects of interest. By interacting with the visual space, users are able to observe the semantic relevance between a query and a resulting Web page with respect to their subjects of interest, context information, or concern. Users' subjects of interest can be dynamically changed, redefined, added, or deleted from the visual space.
Internet resources for the anaesthesiologist.

PubMed

Johnson, Edward

2012-05-01

There is considerable useful information about anaesthesia available on the World Wide Web. However, at present, it is very incomplete and scattered around many sites. Many anaesthetists find it difficult to get the right information they need because of the sheer volume of information available on the internet. This article starts with the basics of the Internet, how to utilize the search engine at the maximum and presents a comprehensive list of important websites. These important websites, which are felt to offer high educational value for the anaesthesiologists, have been selected from an extensive search on the Internet. Top-rated anaesthesia websites, web blogs, forums, societies, e-books, e-journals and educational resources are elaborately discussed with relevant URLs.
Sensitivity and Predictive Value of 15 PubMed Search Strategies to Answer Clinical Questions Rated Against Full Systematic Reviews

PubMed Central

Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

2012-01-01

Background Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. Objective To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. Methods We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed’s Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. Results The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%–25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P < .001 compared with unfiltered queries) and positive predictive values of 20%–30% (P < .001 compared with unfiltered queries). This benefit was consistent for most clinical questions. Searches based on related articles retrieved about a third of the relevant studies. Conclusions The Clinical Queries narrow filter, along with well-formulated queries based on the PICO framework, provided the greatest aid in retrieving relevant clinical trials within the 2 first PubMed pages. These results can help
A Semantic Graph Query Language

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kaplan, I L

2006-10-16

Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.
Net improvement of correct answers to therapy questions after pubmed searches: pre/post comparison.

PubMed

McKibbon, Kathleen Ann; Lokker, Cynthia; Keepanasseril, Arun; Wilczynski, Nancy L; Haynes, R Brian

2013-11-08

Clinicians search PubMed for answers to clinical questions although it is time consuming and not always successful. To determine if PubMed used with its Clinical Queries feature to filter results based on study quality would improve search success (more correct answers to clinical questions related to therapy). We invited 528 primary care physicians to participate, 143 (27.1%) consented, and 111 (21.0% of the total and 77.6% of those who consented) completed the study. Participants answered 14 yes/no therapy questions and were given 4 of these (2 originally answered correctly and 2 originally answered incorrectly) to search using either the PubMed main screen or PubMed Clinical Queries narrow therapy filter via a purpose-built system with identical search screens. Participants also picked 3 of the first 20 retrieved citations that best addressed each question. They were then asked to re-answer the original 14 questions. We found no statistically significant differences in the rates of correct or incorrect answers using the PubMed main screen or PubMed Clinical Queries. The rate of correct answers increased from 50.0% to 61.4% (95% CI 55.0%-67.8%) for the PubMed main screen searches and from 50.0% to 59.1% (95% CI 52.6%-65.6%) for Clinical Queries searches. These net absolute increases of 11.4% and 9.1%, respectively, included previously correct answers changing to incorrect at a rate of 9.5% (95% CI 5.6%-13.4%) for PubMed main screen searches and 9.1% (95% CI 5.3%-12.9%) for Clinical Queries searches, combined with increases in the rate of being correct of 20.5% (95% CI 15.2%-25.8%) for PubMed main screen searches and 17.7% (95% CI 12.7%-22.7%) for Clinical Queries searches. PubMed can assist clinicians answering clinical questions with an approximately 10% absolute rate of improvement in correct answers. This small increase includes more correct answers partially offset by a decrease in previously correct answers.
Quality of Health Information on the Internet for Urolithiasis on the Google Search Engine.

PubMed

Chang, Dwayne T S; Abouassaly, Robert; Lawrentschuk, Nathan

2016-01-01

Purpose . To compare the quality of health information on the Internet for keywords related to urolithiasis, to assess for difference in information quality across four main Western languages, and to compare the source of sponsorship in these websites. Methods . Health On the Net (HON) Foundation principles were utilised to determine quality information. Fifteen keywords related to urolithiasis were searched on the Google search engine. The first 150 websites were assessed against the HON principles and the source of sponsorship determined. Results . A total of 8986 websites were analysed. A proportion of HON-accredited websites for individual search terms range between 2.5% and 12.0%. The first 50 websites were more likely to be HON-positive compared to websites 51-100 and 101-150. French websites searched were more likely to be HON-positive whereas German websites were less likely to be HON-positive than English websites. There was no statistically significant difference between the rate of HON-positive English and Spanish websites. The three main website sponsors were from government/educational sources (40.2%), followed by commercial (29.9%) and physician/surgeon sources (18.6%). Conclusions . Health information on most urolithiasis websites was not validated. Nearly one-third of websites in this study have commercial sponsorship. Doctors should recognise the need for more reliable health websites for their patients.
Querying and Ranking XML Documents.

ERIC Educational Resources Information Center

Schlieder, Torsten; Meuss, Holger

2002-01-01

Discussion of XML, information retrieval, precision, and recall focuses on a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Topics include a query model based on tree matching; structured queries and term-based ranking; and term frequency and…
An intuitive graphical webserver for multiple-choice protein sequence search.

PubMed

Banky, Daniel; Szalkai, Balazs; Grolmusz, Vince

2014-04-10

Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word "BLAST" becomes a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently used webservers will be difficult. Some servers support the formation of queries with regular expressions, but most of the users are unfamiliar with their syntax. Here we present an intuitive, easily applicable webserver, the Protein Sequence Analysis server, that allows the formation of multiple choice queries by simply drawing the residues to their positions; if more than one residue are drawn to the same position, then they will be nicely stacked on the user interface, indicating the multiple choice at the given position. This computer-game-like interface is natural and intuitive, and the coloring of the residues makes possible to form queries requiring not just certain amino acids in the given positions, but also small nonpolar, negatively charged, hydrophobic, positively charged, or polar ones. The webserver is available at http://psa.pitgroup.org. Copyright © 2014 Elsevier B.V. All rights reserved.
An evaluation of internet use by neurosurgery patients prior to lumbar disc surgery and of information available on internet.

PubMed

Atci, Ibrahim Burak; Yilmaz, Hakan; Kocaman, Umit; Samanci, Mustafa Yavuz

2017-07-01

The aim of this study was to evaluate the Internet use of a group of lumbar disc surgery candidates in order to determine the rate of Internet search by the patients on their disorders and more importantly the reliability of the accessed websites. Fifty patients who were scheduled for lumbar disc surgery were divided into 2 groups, namely patients who accepted the surgery at the first offer and those who wanted to think over. Educational level information was obtained and patients were asked whether they had searched their disorder and offered surgery on the Internet. Then, a questionnaire was administered and the reliability of the websites was evaluated. Correction: The first 30 websites on the first 3 pages of Google ® search engine, the most commonly used search engine in Turkey, were evaluated with the DISCERN ® instrument. Of 50 patients, 33 (66%) had conducted a search for the surgery on the Internet. All university graduates, 88.2% of high school graduates, and 18.7% of primary-secondary school graduates had conducted an Internet search. The quality and reliability of the information was high (4.5 points) for 2 (7.1%) websites, moderate (2.3 points) for 6 websites (21.4%) and poor (1 point) for 20 websites (71.4%) as scored with the DISCERN ® instrument. The mean DISCERN ® score of was 1.1 for websites of health-related institutions or healthcare news, 2.75 for personal websites of physicians and 2.5 for personal websites of non-physicians. The mean DISCERN ® score of all websites was 1.5. Most of the patients undergoing lumbar disc surgery at our clinic had searched information about the surgical procedure on the Internet. We found that 92.9% of the websites evaluated with the DISCERN ® instrument had inadequate information, suggesting low-level reliability. Copyright © 2017 Elsevier B.V. All rights reserved.
Mirador: A Simple, Fast Search Interface for Remote Sensing Data

NASA Technical Reports Server (NTRS)

Lynnes, Christopher; Strub, Richard; Seiler, Edward; Joshi, Talak; MacHarrie, Peter

2008-01-01

A major challenge for remote sensing science researchers is searching and acquiring relevant data files for their research projects based on content, space and time constraints. Several structured query (SQ) and hierarchical navigation (HN) search interfaces have been develop ed to satisfy this requirement, yet the dominant search engines in th e general domain are based on free-text search. The Goddard Earth Sci ences Data and Information Services Center has developed a free-text search interface named Mirador that supports space-time queries, inc luding a gazetteer and geophysical event gazetteer. In order to compe nsate for a slightly reduced search precision relative to SQ and HN t echniques, Mirador uses several search optimizations to return result s quickly. The quick response enables a more iterative search strateg y than is available with many SQ and HN techniques.
Predicting consumer behavior with Web search.

PubMed

Goel, Sharad; Hofman, Jake M; Lahaie, Sébastien; Pennock, David M; Watts, Duncan J

2010-10-12

Recent work has demonstrated that Web search volume can "predict the present," meaning that it can be used to accurately track outcomes such as unemployment levels, auto and home sales, and disease prevalence in near real time. Here we show that what consumers are searching for online can also predict their collective future behavior days or even weeks in advance. Specifically we use search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes. We also find that search counts generally boost the performance of baseline models fit on other publicly available data, where the boost varies from modest to dramatic, depending on the application in question. Finally, we reexamine previous work on tracking flu trends and show that, perhaps surprisingly, the utility of search data relative to a simple autoregressive model is modest. We conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries provide a useful guide to the near future.
Secure Skyline Queries on Cloud Platform.

PubMed

Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

2017-04-01

Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.
Secure Skyline Queries on Cloud Platform

PubMed Central

Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

2017-01-01

Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710
Counter-regulating on the Internet: Threat elicits preferential processing of positive information.

PubMed

Greving, Hannah; Sassenberg, Kai; Fetterman, Adam

2015-09-01

The Internet is a central source of information. It is increasingly used for information search in self-relevant domains (e.g., health). Self-relevant topics are also associated with specific emotions and motivational states. For example, individuals may fear serious illness and feel threatened. Thus far, the impact of threat has received little attention in Internet-based research. The current studies investigated how threat influences Internet search. Threat is known to elicit the preferential processing of positive information. The self-directed nature of Internet search should particularly provide opportunities for such processing behavior. We predicted that during Internet search, more positive information would be processed (i.e., allocated more attention to) and more positive knowledge would be acquired under threat than in a control condition. Three experiments supported this prediction: Under threat, attention is directed more to positive web pages (Study 1) and positive links (Study 2), and more positive information is acquired (Studies 1 and 3) than in a control condition. Notably, the effect on knowledge acquisition was mediated by the effect on attention allocation during an actual Internet search (Study 1). Thus, Internet search under threat leads to selective processing of positive information and dampens threatened individuals' negative affect. (c) 2015 APA, all rights reserved).
Exploring the e-cigarette e-commerce marketplace: Identifying Internet e-cigarette marketing characteristics and regulatory gaps.

PubMed

Mackey, Tim K; Miner, Angela; Cuomo, Raphael E

2015-11-01

The electronic cigarette (e-cigarette) market is maturing into a billion-dollar industry. Expansion includes new channels of access not sufficiently assessed, including Internet sales of e-cigarettes. This study identifies unique e-cigarette Internet vendor characteristics, including geographic location, promotional strategies, use of social networking, presence/absence of age verification, and consumer warning representation. We performed structured Internet search engine queries and used inclusion/exclusion criteria to identify e-cigarette vendors. We then conducted content analysis of characteristics of interest. Our examination yielded 57 e-cigarette Internet vendors including 54.4% (n=31) that sold exclusively online. The vast majority of websites (96.5%, n=55) were located in the U.S. Vendors used a variety of sales promotion strategies to market e-cigarettes including 70.2% (n=40) that used more than one social network service (SNS) and 42.1% (n=24) that used more than one promotional sales strategies. Most vendors (68.4%, n=39) displayed one or more health warnings on their website, but often displayed them in smaller font or in their terms and conditions. Additionally, 35.1% (n=20) of vendors did not have any detectable age verification process. E-cigarette Internet vendors are actively engaged in various promotional activities to increase the appeal and presence of their products online. In the absence of FDA regulations specific to the Internet, the e-cigarette e-commerce marketplace is likely to grow. This digital environment poses unique challenges requiring targeted policy-making including robust online age verification, monitoring of SNS marketing, and greater scrutiny of certain forms of marketing promotional practices. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Incidence of online health information search: a useful proxy for public health risk perception.

PubMed

Liang, Bo; Scammon, Debra L

2013-06-17

Internet users use search engines to look for information online, including health information. Researchers in medical informatics have found a high correlation of the occurrence of certain search queries and the incidence of certain diseases. Consumers' search for information about diseases is related to current health status with regard to a disease and to the social environments that shape the public's attitudes and behaviors. This study aimed to investigate the extent to which public health risk perception as demonstrated by online information searches related to a health risk can be explained by the incidence of the health risk and social components of a specific population's environment. Using an ecological perspective, we suggest that a population's general concern for a health risk is formed by the incidence of the risk and social (eg, media attention) factors related with the risk. We constructed a dataset that included state-level data from 32 states on the incidence of the flu; a number of social factors, such as media attention to the flu; private resources, such as education and health insurance coverage; public resources, such as hospital beds and primary physicians; and utilization of these resources, including inpatient days and outpatient visits. We then explored whether online information searches about the flu (seasonal and pandemic flu) can be predicted using these variables. We used factor analysis to construct indexes for sets of social factors (private resources, public resources). We then applied panel data multiple regression analysis to exploit both time-series and cross-sectional variation in the data over a 7-year period. Overall, the results provide evidence that the main effects of independent variables-the incidence of the flu (P<.001); social factors, including media attention (P<.001); private resources, including life quality (P<.001) and health lifestyles (P=.009); and public resources, such as hospital care utilization (P=.008
Incidence of Online Health Information Search: A Useful Proxy for Public Health Risk Perception

PubMed Central

Scammon, Debra L

2013-01-01

Background Internet users use search engines to look for information online, including health information. Researchers in medical informatics have found a high correlation of the occurrence of certain search queries and the incidence of certain diseases. Consumers’ search for information about diseases is related to current health status with regard to a disease and to the social environments that shape the public’s attitudes and behaviors. Objective This study aimed to investigate the extent to which public health risk perception as demonstrated by online information searches related to a health risk can be explained by the incidence of the health risk and social components of a specific population’s environment. Using an ecological perspective, we suggest that a population’s general concern for a health risk is formed by the incidence of the risk and social (eg, media attention) factors related with the risk. Methods We constructed a dataset that included state-level data from 32 states on the incidence of the flu; a number of social factors, such as media attention to the flu; private resources, such as education and health insurance coverage; public resources, such as hospital beds and primary physicians; and utilization of these resources, including inpatient days and outpatient visits. We then explored whether online information searches about the flu (seasonal and pandemic flu) can be predicted using these variables. We used factor analysis to construct indexes for sets of social factors (private resources, public resources). We then applied panel data multiple regression analysis to exploit both time-series and cross-sectional variation in the data over a 7-year period. Results Overall, the results provide evidence that the main effects of independent variables—the incidence of the flu (P<.001); social factors, including media attention (P<.001); private resources, including life quality (P<.001) and health lifestyles (P=.009); and public
Performance analysis of different database in new internet mapping system

NASA Astrophysics Data System (ADS)

Yao, Xing; Su, Wei; Gao, Shuai

2017-03-01

In the Mapping System of New Internet, Massive mapping entries between AID and RID need to be stored, added, updated, and deleted. In order to better deal with the problem when facing a large number of mapping entries update and query request, the Mapping System of New Internet must use high-performance database. In this paper, we focus on the performance of Redis, SQLite, and MySQL these three typical databases, and the results show that the Mapping System based on different databases can adapt to different needs according to the actual situation.
Monitoring Moving Queries inside a Safe Region

PubMed Central

Al-Khalidi, Haidar; Taniar, David; Alamri, Sultan

2014-01-01

With mobile moving range queries, there is a need to recalculate the relevant surrounding objects of interest whenever the query moves. Therefore, monitoring the moving query is very costly. The safe region is one method that has been proposed to minimise the communication and computation cost of continuously monitoring a moving range query. Inside the safe region the set of objects of interest to the query do not change; thus there is no need to update the query while it is inside its safe region. However, when the query leaves its safe region the mobile device has to reevaluate the query, necessitating communication with the server. Knowing when and where the mobile device will leave a safe region is widely known as a difficult problem. To solve this problem, we propose a novel method to monitor the position of the query over time using a linear function based on the direction of the query obtained by periodic monitoring of its position. Periodic monitoring ensures that the query is aware of its location all the time. This method reduces the costs associated with communications in client-server architecture. Computational results show that our method is successful in handling moving query patterns. PMID:24696652
Lost in translation? A multilingual Query Builder improves the quality of PubMed queries: a randomised controlled trial.

PubMed

Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas

2017-07-03

MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p < 0.01). It took significantly more time for the members of the English group than the members of the French group to build each query, respectively 194 sec vs. 128 sec; p < 0.01. This multi-lingual query builder is an effective tool to improve the quality of PubMed queries in particular for researchers whose first language is not English.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.