A rank-based Prediction Algorithm of Learning User's Intention
NASA Astrophysics Data System (ADS)
Shen, Jie; Gao, Ying; Chen, Cang; Gong, HaiPing
Internet search has become an important part in people's daily life. People can find many types of information to meet different needs through search engines on the Internet. There are two issues for the current search engines: first, the users should predetermine the types of information they want and then change to the appropriate types of search engine interfaces. Second, most search engines can support multiple kinds of search functions, each function has its own separate search interface. While users need different types of information, they must switch between different interfaces. In practice, most queries are corresponding to various types of information results. These queries can search the relevant results in various search engines, such as query "Palace" contains the websites about the introduction of the National Palace Museum, blog, Wikipedia, some pictures and video information. This paper presents a new aggregative algorithm for all kinds of search results. It can filter and sort the search results by learning three aspects about the query words, search results and search history logs to achieve the purpose of detecting user's intention. Experiments demonstrate that this rank-based method for multi-types of search results is effective. It can meet the user's search needs well, enhance user's satisfaction, provide an effective and rational model for optimizing search engines and improve user's search experience.
Noesis: Ontology based Scoped Search Engine and Resource Aggregator for Atmospheric Science
NASA Astrophysics Data System (ADS)
Ramachandran, R.; Movva, S.; Li, X.; Cherukuri, P.; Graves, S.
2006-12-01
The goal for search engines is to return results that are both accurate and complete. The search engines should find only what you really want and find everything you really want. Search engines (even meta search engines) lack semantics. The basis for search is simply based on string matching between the user's query term and the resource database and the semantics associated with the search string is not captured. For example, if an atmospheric scientist is searching for "pressure" related web resources, most search engines return inaccurate results such as web resources related to blood pressure. In this presentation Noesis, which is a meta-search engine and a resource aggregator that uses domain ontologies to provide scoped search capabilities will be described. Noesis uses domain ontologies to help the user scope the search query to ensure that the search results are both accurate and complete. The domain ontologies guide the user to refine their search query and thereby reduce the user's burden of experimenting with different search strings. Semantics are captured by refining the query terms to cover synonyms, specializations, generalizations and related concepts. Noesis also serves as a resource aggregator. It categorizes the search results from different online resources such as education materials, publications, datasets, web search engines that might be of interest to the user.
Search Engines: Gateway to a New ``Panopticon''?
NASA Astrophysics Data System (ADS)
Kosta, Eleni; Kalloniatis, Christos; Mitrou, Lilian; Kavakli, Evangelia
Nowadays, Internet users are depending on various search engines in order to be able to find requested information on the Web. Although most users feel that they are and remain anonymous when they place their search queries, reality proves otherwise. The increasing importance of search engines for the location of the desired information on the Internet usually leads to considerable inroads into the privacy of users. The scope of this paper is to study the main privacy issues with regard to search engines, such as the anonymisation of search logs and their retention period, and to examine the applicability of the European data protection legislation to non-EU search engine providers. Ixquick, a privacy-friendly meta search engine will be presented as an alternative to privacy intrusive existing practices of search engines.
Using Internet Search Engines to Obtain Medical Information: A Comparative Study
Wang, Liupu; Wang, Juexin; Wang, Michael; Li, Yong; Liang, Yanchun
2012-01-01
Background The Internet has become one of the most important means to obtain health and medical information. It is often the first step in checking for basic information about a disease and its treatment. The search results are often useful to general users. Various search engines such as Google, Yahoo!, Bing, and Ask.com can play an important role in obtaining medical information for both medical professionals and lay people. However, the usability and effectiveness of various search engines for medical information have not been comprehensively compared and evaluated. Objective To compare major Internet search engines in their usability of obtaining medical and health information. Methods We applied usability testing as a software engineering technique and a standard industry practice to compare the four major search engines (Google, Yahoo!, Bing, and Ask.com) in obtaining health and medical information. For this purpose, we searched the keyword breast cancer in Google, Yahoo!, Bing, and Ask.com and saved the results of the top 200 links from each search engine. We combined nonredundant links from the four search engines and gave them to volunteer users in an alphabetical order. The volunteer users evaluated the websites and scored each website from 0 to 10 (lowest to highest) based on the usefulness of the content relevant to breast cancer. A medical expert identified six well-known websites related to breast cancer in advance as standards. We also used five keywords associated with breast cancer defined in the latest release of Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and analyzed their occurrence in the websites. Results Each search engine provided rich information related to breast cancer in the search results. All six standard websites were among the top 30 in search results of all four search engines. Google had the best search validity (in terms of whether a website could be opened), followed by Bing, Ask.com, and Yahoo!. The search results highly overlapped between the search engines, and the overlap between any two search engines was about half or more. On the other hand, each search engine emphasized various types of content differently. In terms of user satisfaction analysis, volunteer users scored Bing the highest for its usefulness, followed by Yahoo!, Google, and Ask.com. Conclusions Google, Yahoo!, Bing, and Ask.com are by and large effective search engines for helping lay users get health and medical information. Nevertheless, the current ranking methods have some pitfalls and there is room for improvement to help users get more accurate and useful information. We suggest that search engine users explore multiple search engines to search different types of health information and medical knowledge for their own needs and get a professional consultation if necessary. PMID:22672889
Using Internet search engines to obtain medical information: a comparative study.
Wang, Liupu; Wang, Juexin; Wang, Michael; Li, Yong; Liang, Yanchun; Xu, Dong
2012-05-16
The Internet has become one of the most important means to obtain health and medical information. It is often the first step in checking for basic information about a disease and its treatment. The search results are often useful to general users. Various search engines such as Google, Yahoo!, Bing, and Ask.com can play an important role in obtaining medical information for both medical professionals and lay people. However, the usability and effectiveness of various search engines for medical information have not been comprehensively compared and evaluated. To compare major Internet search engines in their usability of obtaining medical and health information. We applied usability testing as a software engineering technique and a standard industry practice to compare the four major search engines (Google, Yahoo!, Bing, and Ask.com) in obtaining health and medical information. For this purpose, we searched the keyword breast cancer in Google, Yahoo!, Bing, and Ask.com and saved the results of the top 200 links from each search engine. We combined nonredundant links from the four search engines and gave them to volunteer users in an alphabetical order. The volunteer users evaluated the websites and scored each website from 0 to 10 (lowest to highest) based on the usefulness of the content relevant to breast cancer. A medical expert identified six well-known websites related to breast cancer in advance as standards. We also used five keywords associated with breast cancer defined in the latest release of Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and analyzed their occurrence in the websites. Each search engine provided rich information related to breast cancer in the search results. All six standard websites were among the top 30 in search results of all four search engines. Google had the best search validity (in terms of whether a website could be opened), followed by Bing, Ask.com, and Yahoo!. The search results highly overlapped between the search engines, and the overlap between any two search engines was about half or more. On the other hand, each search engine emphasized various types of content differently. In terms of user satisfaction analysis, volunteer users scored Bing the highest for its usefulness, followed by Yahoo!, Google, and Ask.com. Google, Yahoo!, Bing, and Ask.com are by and large effective search engines for helping lay users get health and medical information. Nevertheless, the current ranking methods have some pitfalls and there is room for improvement to help users get more accurate and useful information. We suggest that search engine users explore multiple search engines to search different types of health information and medical knowledge for their own needs and get a professional consultation if necessary.
Finding My Needle in the Haystack: Effective Personalized Re-ranking of Search Results in Prospector
NASA Astrophysics Data System (ADS)
König, Florian; van Velsen, Lex; Paramythis, Alexandros
This paper provides an overview of Prospector, a personalized Internet meta-search engine, which utilizes a combination of ontological information, ratings-based models of user interests, and complementary theme-oriented group models to recommend (through re-ranking) search results obtained from an underlying search engine. Re-ranking brings “closer to the top” those items that are of particular interest to a user or have high relevance to a given theme. A user-based, real-world evaluation has shown that the system is effective in promoting results of interest, but lags behind Google in user acceptance, possibly due to the absence of features popularized by said search engine. Overall, users would consider employing a personalized search engine to perform searches with terms that require disambiguation and / or contextualization.
Smart internet search engine through 6W
NASA Astrophysics Data System (ADS)
Goehler, Stephen; Cader, Masud; Szu, Harold
2006-04-01
Current Internet search engine technology is limited in its ability to display necessary relevant information to the user. Yahoo, Google and Microsoft use lookup tables or indexes which limits the ability of users to find their desired information. While these companies have improved their results over the years by enhancing their existing technology and algorithms with specialized heuristics such as PageRank, there is a need for a next generation smart search engine that can effectively interpret the relevance of user searches and provide the actual information requested. This paper explores whether a smarter Internet search engine can effectively fulfill a user's needs through the use of 6W representations.
MetaSEEk: a content-based metasearch engine for images
NASA Astrophysics Data System (ADS)
Beigi, Mandis; Benitez, Ana B.; Chang, Shih-Fu
1997-12-01
Search engines are the most powerful resources for finding information on the rapidly expanding World Wide Web (WWW). Finding the desired search engines and learning how to use them, however, can be very time consuming. The integration of such search tools enables the users to access information across the world in a transparent and efficient manner. These systems are called meta-search engines. The recent emergence of visual information retrieval (VIR) search engines on the web is leading to the same efficiency problem. This paper describes and evaluates MetaSEEk, a content-based meta-search engine used for finding images on the Web based on their visual information. MetaSEEk is designed to intelligently select and interface with multiple on-line image search engines by ranking their performance for different classes of user queries. User feedback is also integrated in the ranking refinement. We compare MetaSEEk with a base line version of meta-search engine, which does not use the past performance of the different search engines in recommending target search engines for future queries.
Re-ranking via User Feedback: Georgetown University at TREC 2015 DD Track
2015-11-20
Re-ranking via User Feedback: Georgetown University at TREC 2015 DD Track Jiyun Luo and Hui Yang Department of Computer Science, Georgetown...involved in a search process, the user and the search engine. In TREC DD , the user is modeled by a simulator, called “jig”. The jig and the search engine...simulating user is provided by TREC 2015 DD Track organizer, and is called “jig”. There are 118 search topics in total. For each search topic, a short
PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface.
Uszkoreit, Julian; Maerkens, Alexandra; Perez-Riverol, Yasset; Meyer, Helmut E; Marcus, Katrin; Stephan, Christian; Kohlbacher, Oliver; Eisenacher, Martin
2015-07-02
Protein inference connects the peptide spectrum matches (PSMs) obtained from database search engines back to proteins, which are typically at the heart of most proteomics studies. Different search engines yield different PSMs and thus different protein lists. Analysis of results from one or multiple search engines is often hampered by different data exchange formats and lack of convenient and intuitive user interfaces. We present PIA, a flexible software suite for combining PSMs from different search engine runs and turning these into consistent results. PIA can be integrated into proteomics data analysis workflows in several ways. A user-friendly graphical user interface can be run either locally or (e.g., for larger core facilities) from a central server. For automated data processing, stand-alone tools are available. PIA implements several established protein inference algorithms and can combine results from different search engines seamlessly. On several benchmark data sets, we show that PIA can identify a larger number of proteins at the same protein FDR when compared to that using inference based on a single search engine. PIA supports the majority of established search engines and data in the mzIdentML standard format. It is implemented in Java and freely available at https://github.com/mpc-bioinformatics/pia.
Developing a distributed HTML5-based search engine for geospatial resource discovery
NASA Astrophysics Data System (ADS)
ZHOU, N.; XIA, J.; Nebert, D.; Yang, C.; Gui, Z.; Liu, K.
2013-12-01
With explosive growth of data, Geospatial Cyberinfrastructure(GCI) components are developed to manage geospatial resources, such as data discovery and data publishing. However, the efficiency of geospatial resources discovery is still challenging in that: (1) existing GCIs are usually developed for users of specific domains. Users may have to visit a number of GCIs to find appropriate resources; (2) The complexity of decentralized network environment usually results in slow response and pool user experience; (3) Users who use different browsers and devices may have very different user experiences because of the diversity of front-end platforms (e.g. Silverlight, Flash or HTML). To address these issues, we developed a distributed and HTML5-based search engine. Specifically, (1)the search engine adopts a brokering approach to retrieve geospatial metadata from various and distributed GCIs; (2) the asynchronous record retrieval mode enhances the search performance and user interactivity; (3) the search engine based on HTML5 is able to provide unified access capabilities for users with different devices (e.g. tablet and smartphone).
LoyalTracker: Visualizing Loyalty Dynamics in Search Engines.
Shi, Conglei; Wu, Yingcai; Liu, Shixia; Zhou, Hong; Qu, Huamin
2014-12-01
The huge amount of user log data collected by search engine providers creates new opportunities to understand user loyalty and defection behavior at an unprecedented scale. However, this also poses a great challenge to analyze the behavior and glean insights into the complex, large data. In this paper, we introduce LoyalTracker, a visual analytics system to track user loyalty and switching behavior towards multiple search engines from the vast amount of user log data. We propose a new interactive visualization technique (flow view) based on a flow metaphor, which conveys a proper visual summary of the dynamics of user loyalty of thousands of users over time. Two other visualization techniques, a density map and a word cloud, are integrated to enable analysts to gain further insights into the patterns identified by the flow view. Case studies and the interview with domain experts are conducted to demonstrate the usefulness of our technique in understanding user loyalty and switching behavior in search engines.
Start Your Engines: Surfing with Search Engines for Kids.
ERIC Educational Resources Information Center
Byerly, Greg; Brodie, Carolyn S.
1999-01-01
Suggests that to be an effective educator and user of the Web it is essential to know the basics about search engines. Presents tips for using search engines. Describes several search engines for children and young adults, as well as some general filtered search engines for children. (AEF)
Leroy, Gondy; Xu, Jennifer; Chung, Wingyan; Eggers, Shauna; Chen, Hsinchun
2007-01-01
Retrieving sufficient relevant information online is difficult for many people because they use too few keywords to search and search engines do not provide many support tools. To further complicate the search, users often ignore support tools when available. Our goal is to evaluate in a realistic setting when users use support tools and how they perceive these tools. We compared three medical search engines with support tools that require more or less effort from users to form a query and evaluate results. We carried out an end user study with 23 users who were asked to find information, i.e., subtopics and supporting abstracts, for a given theme. We used a balanced within-subjects design and report on the effectiveness, efficiency and usability of the support tools from the end user perspective. We found significant differences in efficiency but did not find significant differences in effectiveness between the three search engines. Dynamic user support tools requiring less effort led to higher efficiency. Fewer searches were needed and more documents were found per search when both query reformulation and result review tools dynamically adjust to the user query. The query reformulation tool that provided a long list of keywords, dynamically adjusted to the user query, was used most often and led to more subtopics. As hypothesized, the dynamic result review tools were used more often and led to more subtopics than static ones. These results were corroborated by the usability questionnaires, which showed that support tools that dynamically optimize output were preferred.
Tags Extarction from Spatial Documents in Search Engines
NASA Astrophysics Data System (ADS)
Borhaninejad, S.; Hakimpour, F.; Hamzei, E.
2015-12-01
Nowadays the selective access to information on the Web is provided by search engines, but in the cases which the data includes spatial information the search task becomes more complex and search engines require special capabilities. The purpose of this study is to extract the information which lies in spatial documents. To that end, we implement and evaluate information extraction from GML documents and a retrieval method in an integrated approach. Our proposed system consists of three components: crawler, database and user interface. In crawler component, GML documents are discovered and their text is parsed for information extraction; storage. The database component is responsible for indexing of information which is collected by crawlers. Finally the user interface component provides the interaction between system and user. We have implemented this system as a pilot system on an Application Server as a simulation of Web. Our system as a spatial search engine provided searching capability throughout the GML documents and thus an important step to improve the efficiency of search engines has been taken.
SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches.
Vaudel, Marc; Barsnes, Harald; Berven, Frode S; Sickmann, Albert; Martens, Lennart
2011-03-01
The identification of proteins by mass spectrometry is a standard technique in the field of proteomics, relying on search engines to perform the identifications of the acquired spectra. Here, we present a user-friendly, lightweight and open-source graphical user interface called SearchGUI (http://searchgui.googlecode.com), for configuring and running the freely available OMSSA (open mass spectrometry search algorithm) and X!Tandem search engines simultaneously. Freely available under the permissible Apache2 license, SearchGUI is supported on Windows, Linux and OSX. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Custom Search Engines: Tools & Tips
ERIC Educational Resources Information Center
Notess, Greg R.
2008-01-01
Few have the resources to build a Google or Yahoo! from scratch. Yet anyone can build a search engine based on a subset of the large search engines' databases. Use Google Custom Search Engine or Yahoo! Search Builder or any of the other similar programs to create a vertical search engine targeting sites of interest to users. The basic steps to…
Multitasking Web Searching and Implications for Design.
ERIC Educational Resources Information Center
Ozmutlu, Seda; Ozmutlu, H. C.; Spink, Amanda
2003-01-01
Findings from a study of users' multitasking searches on Web search engines include: multitasking searches are a noticeable user behavior; multitasking search sessions are longer than regular search sessions in terms of queries per session and duration; both Excite and AlltheWeb.com users search for about three topics per multitasking session and…
Finding and Exploring Health Information with a Slider-Based User Interface.
Pang, Patrick Cheong-Iao; Verspoor, Karin; Pearce, Jon; Chang, Shanton
2016-01-01
Despite the fact that search engines are the primary channel to access online health information, there are better ways to find and explore health information on the web. Search engines are prone to problems when they are used to find health information. For instance, users have difficulties in expressing health scenarios with appropriate search keywords, search results are not optimised for medical queries, and the search process does not account for users' literacy levels and reading preferences. In this paper, we describe our approach to addressing these problems by introducing a novel design using a slider-based user interface for discovering health information without the need for precise search keywords. The user evaluation suggests that the interface is easy to use and able to assist users in the process of discovering new information. This study demonstrates the potential value of adopting slider controls in the user interface of health websites for navigation and information discovery.
The Use of Web Search Engines in Information Science Research.
ERIC Educational Resources Information Center
Bar-Ilan, Judit
2004-01-01
Reviews the literature on the use of Web search engines in information science research, including: ways users interact with Web search engines; social aspects of searching; structure and dynamic nature of the Web; link analysis; other bibliometric applications; characterizing information on the Web; search engine evaluation and improvement; and…
Adjacency and Proximity Searching in the Science Citation Index and Google
2005-01-01
major database search engines , including commercial S&T database search engines (e.g., Science Citation Index (SCI), Engineering Compendex (EC...PubMed, OVID), Federal agency award database search engines (e.g., NSF, NIH, DOE, EPA, as accessed in Federal R&D Project Summaries), Web search Engines (e.g...searching. Some database search engines allow strict constrained co- occurrence searching as a user option (e.g., OVID, EC), while others do not (e.g., SCI
`Googling' Terrorists: Are Northern Irish Terrorists Visible on Internet Search Engines?
NASA Astrophysics Data System (ADS)
Reilly, P.
In this chapter, the analysis suggests that Northern Irish terrorists are not visible on Web search engines when net users employ conventional Internet search techniques. Editors of mass media organisations traditionally have had the ability to decide whether a terrorist atrocity is `newsworthy,' controlling the `oxygen' supply that sustains all forms of terrorism. This process, also known as `gatekeeping,' is often influenced by the norms of social responsibility, or alternatively, with regard to the interests of the advertisers and corporate sponsors that sustain mass media organisations. The analysis presented in this chapter suggests that Internet search engines can also be characterised as `gatekeepers,' albeit without the ability to shape the content of Websites before it reaches net users. Instead, Internet search engines give priority retrieval to certain Websites within their directory, pointing net users towards these Websites rather than others on the Internet. Net users are more likely to click on links to the more `visible' Websites on Internet search engine directories, these sites invariably being the highest `ranked' in response to a particular search query. A number of factors including the design of the Website and the number of links to external sites determine the `visibility' of a Website on Internet search engines. The study suggests that Northern Irish terrorists and their sympathisers are unlikely to achieve a greater degree of `visibility' online than they enjoy in the conventional mass media through the perpetration of atrocities. Although these groups may have a greater degree of freedom on the Internet to publicise their ideologies, they are still likely to be speaking to the converted or members of the press. Although it is easier to locate Northern Irish terrorist organisations on Internet search engines by linking in via ideology, ideological description searches, such as `Irish Republican' and `Ulster Loyalist,' are more likely to generate links pointing towards the sites of research institutes and independent media organisations than sites sympathetic to Northern Irish terrorist organisations. The chapter argues that Northern Irish terrorists are only visible on search engines if net users select the correct search terms.
Improving Web Search for Difficult Queries
ERIC Educational Resources Information Center
Wang, Xuanhui
2009-01-01
Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…
The Gaze of the Perfect Search Engine: Google as an Infrastructure of Dataveillance
NASA Astrophysics Data System (ADS)
Zimmer, M.
Web search engines have emerged as a ubiquitous and vital tool for the successful navigation of the growing online informational sphere. The goal of the world's largest search engine, Google, is to "organize the world's information and make it universally accessible and useful" and to create the "perfect search engine" that provides only intuitive, personalized, and relevant results. While intended to enhance intellectual mobility in the online sphere, this chapter reveals that the quest for the perfect search engine requires the widespread monitoring and aggregation of a users' online personal and intellectual activities, threatening the values the perfect search engines were designed to sustain. It argues that these search-based infrastructures of dataveillance contribute to a rapidly emerging "soft cage" of everyday digital surveillance, where they, like other dataveillance technologies before them, contribute to the curtailing of individual freedom, affect users' sense of self, and present issues of deep discrimination and social justice.
Directing the public to evidence-based online content
Cooper, Crystale Purvis; Gelb, Cynthia A; Vaughn, Alexandra N; Smuland, Jenny; Hughes, Alexandra G; Hawkins, Nikki A
2015-01-01
To direct online users searching for gynecologic cancer information to accurate content, the Centers for Disease Control and Prevention’s (CDC) ‘Inside Knowledge: Get the Facts About Gynecologic Cancer’ campaign sponsored search engine advertisements in English and Spanish. From June 2012 to August 2013, advertisements appeared when US Google users entered search terms related to gynecologic cancer. Users who clicked on the advertisements were directed to relevant content on the CDC website. Compared with the 3 months before the initiative (March–May 2012), visits to the CDC web pages linked to the advertisements were 26 times higher after the initiative began (June–August 2012) (p<0.01), and 65 times higher when the search engine advertisements were supplemented with promotion on television and additional websites (September 2012–August 2013) (p<0.01). Search engine advertisements can direct users to evidence-based content at a highly teachable moment—when they are seeking relevant information. PMID:25053580
Impact of Internet Search Engines on OPAC Users: A Study of Punjabi University, Patiala (India)
ERIC Educational Resources Information Center
Kumar, Shiv
2012-01-01
Purpose: The aim of this paper is to study the impact of internet search engine usage with special reference to OPAC searches in the Punjabi University Library, Patiala, Punjab (India). Design/methodology/approach: The primary data were collected from 352 users comprising faculty, research scholars and postgraduate students of the university. A…
Search without Boundaries Using Simple APIs
Tong, Qi
2009-01-01
The U.S. Geological Survey (USGS) Library, where the author serves as the digital services librarian, is increasingly challenged to make it easier for users to find information from many heterogeneous information sources. Information is scattered throughout different software applications (i.e., library catalog, federated search engine, link resolver, and vendor websites), and each specializes in one thing. How could the library integrate the functionalities of one application with another and provide a single point of entry for users to search across? To improve the user experience, the library launched an effort to integrate the federated search engine into the library's intranet website. The result is a simple search box that leverages the federated search engine's built-in application programming interfaces (APIs). In this article, the author describes how this project demonstrated the power of APIs and their potential to be used by other enterprise search portals inside or outside of the library.
Cross-System Evaluation of Clinical Trial Search Engines
Jiang, Silis Y.; Weng, Chunhua
2014-01-01
Clinical trials are fundamental to the advancement of medicine but constantly face recruitment difficulties. Various clinical trial search engines have been designed to help health consumers identify trials for which they may be eligible. Unfortunately, knowledge of the usefulness and usability of their designs remains scarce. In this study, we used mixed methods, including time-motion analysis, think-aloud protocol, and survey, to evaluate five popular clinical trial search engines with 11 users. Differences in user preferences and time spent on each system were observed and correlated with user characteristics. In general, searching for applicable trials using these systems is a cognitively demanding task. Our results show that user perceptions of these systems are multifactorial. The survey indicated eTACTS being the generally preferred system, but this finding did not persist among all mixed methods. This study confirms the value of mixed-methods for a comprehensive system evaluation. Future system designers must be aware that different users groups expect different functionalities. PMID:25954590
Cross-system evaluation of clinical trial search engines.
Jiang, Silis Y; Weng, Chunhua
2014-01-01
Clinical trials are fundamental to the advancement of medicine but constantly face recruitment difficulties. Various clinical trial search engines have been designed to help health consumers identify trials for which they may be eligible. Unfortunately, knowledge of the usefulness and usability of their designs remains scarce. In this study, we used mixed methods, including time-motion analysis, think-aloud protocol, and survey, to evaluate five popular clinical trial search engines with 11 users. Differences in user preferences and time spent on each system were observed and correlated with user characteristics. In general, searching for applicable trials using these systems is a cognitively demanding task. Our results show that user perceptions of these systems are multifactorial. The survey indicated eTACTS being the generally preferred system, but this finding did not persist among all mixed methods. This study confirms the value of mixed-methods for a comprehensive system evaluation. Future system designers must be aware that different users groups expect different functionalities.
eTACTS: a method for dynamically filtering clinical trial search results.
Miotto, Riccardo; Jiang, Silis; Weng, Chunhua
2013-12-01
Information overload is a significant problem facing online clinical trial searchers. We present eTACTS, a novel interactive retrieval framework using common eligibility tags to dynamically filter clinical trial search results. eTACTS mines frequent eligibility tags from free-text clinical trial eligibility criteria and uses these tags for trial indexing. After an initial search, eTACTS presents to the user a tag cloud representing the current results. When the user selects a tag, eTACTS retains only those trials containing that tag in their eligibility criteria and generates a new cloud based on tag frequency and co-occurrences in the remaining trials. The user can then select a new tag or unselect a previous tag. The process iterates until a manageable number of trials is returned. We evaluated eTACTS in terms of filtering efficiency, diversity of the search results, and user eligibility to the filtered trials using both qualitative and quantitative methods. eTACTS (1) rapidly reduced search results from over a thousand trials to ten; (2) highlighted trials that are generally not top-ranked by conventional search engines; and (3) retrieved a greater number of suitable trials than existing search engines. eTACTS enables intuitive clinical trial searches by indexing eligibility criteria with effective tags. User evaluation was limited to one case study and a small group of evaluators due to the long duration of the experiment. Although a larger-scale evaluation could be conducted, this feasibility study demonstrated significant advantages of eTACTS over existing clinical trial search engines. A dynamic eligibility tag cloud can potentially enhance state-of-the-art clinical trial search engines by allowing intuitive and efficient filtering of the search result space. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
eTACTS: A Method for Dynamically Filtering Clinical Trial Search Results
Miotto, Riccardo; Jiang, Silis; Weng, Chunhua
2013-01-01
Objective Information overload is a significant problem facing online clinical trial searchers. We present eTACTS, a novel interactive retrieval framework using common eligibility tags to dynamically filter clinical trial search results. Materials and Methods eTACTS mines frequent eligibility tags from free-text clinical trial eligibility criteria and uses these tags for trial indexing. After an initial search, eTACTS presents to the user a tag cloud representing the current results. When the user selects a tag, eTACTS retains only those trials containing that tag in their eligibility criteria and generates a new cloud based on tag frequency and co-occurrences in the remaining trials. The user can then select a new tag or unselect a previous tag. The process iterates until a manageable number of trials is returned. We evaluated eTACTS in terms of filtering efficiency, diversity of the search results, and user eligibility to the filtered trials using both qualitative and quantitative methods. Results eTACTS (1) rapidly reduced search results from over a thousand trials to ten; (2) highlighted trials that are generally not top-ranked by conventional search engines; and (3) retrieved a greater number of suitable trials than existing search engines. Discussion eTACTS enables intuitive clinical trial searches by indexing eligibility criteria with effective tags. User evaluation was limited to one case study and a small group of evaluators due to the long duration of the experiment. Although a larger-scale evaluation could be conducted, this feasibility study demonstrated significant advantages of eTACTS over existing clinical trial search engines. Conclusion A dynamic eligibility tag cloud can potentially enhance state-of-the-art clinical trial search engines by allowing intuitive and efficient filtering of the search result space. PMID:23916863
Index Relativity and Patron Search Strategy.
ERIC Educational Resources Information Center
Allison, DeeAnn; Childers Scott
2002-01-01
Describes a study at the University of Nebraska-Lincoln that compared searches in two different keyword indexes with similar content where search results were dependent on search strategy quality, search engine execution, and content. Results showed search engine execution had an impact on the number of matches and that users ignored search help…
Multimedia Web Searching Trends.
ERIC Educational Resources Information Center
Ozmutlu, Seda; Spink, Amanda; Ozmutlu, H. Cenk
2002-01-01
Examines and compares multimedia Web searching by Excite and FAST search engine users in 2001. Highlights include audio and video queries; time spent on searches; terms per query; ranking of the most frequently used terms; and differences in Web search behaviors of U.S. and European Web users. (Author/LRW)
BioCarian: search engine for exploratory searches in heterogeneous biological databases.
Zaki, Nazar; Tennakoon, Chandana
2017-10-02
There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.
Sexual information seeking on web search engines.
Spink, Amanda; Koricich, Andrew; Jansen, B J; Cole, Charles
2004-02-01
Sexual information seeking is an important element within human information behavior. Seeking sexually related information on the Internet takes many forms and channels, including chat rooms discussions, accessing Websites or searching Web search engines for sexual materials. The study of sexual Web queries provides insight into sexually-related information-seeking behavior, of value to Web users and providers alike. We qualitatively analyzed queries from logs of 1,025,910 Alta Vista and AlltheWeb.com Web user queries from 2001. We compared the differences in sexually-related Web searching between Alta Vista and AlltheWeb.com users. Differences were found in session duration, query outcomes, and search term choices. Implications of the findings for sexual information seeking are discussed.
Foraging patterns in online searches.
Wang, Xiangwen; Pleimling, Michel
2017-03-01
Nowadays online searches are undeniably the most common form of information gathering, as witnessed by billions of clicks generated each day on search engines. In this work we describe online searches as foraging processes that take place on the semi-infinite line. Using a variety of quantities like probability distributions and complementary cumulative distribution functions of step length and waiting time as well as mean square displacements and entropies, we analyze three different click-through logs that contain the detailed information of millions of queries submitted to search engines. Notable differences between the different logs reveal an increased efficiency of the search engines. In the language of foraging, the newer logs indicate that online searches overwhelmingly yield local searches (i.e., on one page of links provided by the search engines), whereas for the older logs the foraging processes are a combination of local searches and relocation phases that are power law distributed. Our investigation of click logs of search engines therefore highlights the presence of intermittent search processes (where phases of local explorations are separated by power law distributed relocation jumps) in online searches. It follows that good search engines enable the users to find the information they are looking for through a local exploration of a single page with search results, whereas for poor search engine users are often forced to do a broader exploration of different pages.
Foraging patterns in online searches
NASA Astrophysics Data System (ADS)
Wang, Xiangwen; Pleimling, Michel
2017-03-01
Nowadays online searches are undeniably the most common form of information gathering, as witnessed by billions of clicks generated each day on search engines. In this work we describe online searches as foraging processes that take place on the semi-infinite line. Using a variety of quantities like probability distributions and complementary cumulative distribution functions of step length and waiting time as well as mean square displacements and entropies, we analyze three different click-through logs that contain the detailed information of millions of queries submitted to search engines. Notable differences between the different logs reveal an increased efficiency of the search engines. In the language of foraging, the newer logs indicate that online searches overwhelmingly yield local searches (i.e., on one page of links provided by the search engines), whereas for the older logs the foraging processes are a combination of local searches and relocation phases that are power law distributed. Our investigation of click logs of search engines therefore highlights the presence of intermittent search processes (where phases of local explorations are separated by power law distributed relocation jumps) in online searches. It follows that good search engines enable the users to find the information they are looking for through a local exploration of a single page with search results, whereas for poor search engine users are often forced to do a broader exploration of different pages.
Health literacy and usability of clinical trial search engines.
Utami, Dina; Bickmore, Timothy W; Barry, Barbara; Paasche-Orlow, Michael K
2014-01-01
Several web-based search engines have been developed to assist individuals to find clinical trials for which they may be interested in volunteering. However, these search engines may be difficult for individuals with low health and computer literacy to navigate. The authors present findings from a usability evaluation of clinical trial search tools with 41 participants across the health and computer literacy spectrum. The study consisted of 3 parts: (a) a usability study of an existing web-based clinical trial search tool; (b) a usability study of a keyword-based clinical trial search tool; and (c) an exploratory study investigating users' information needs when deciding among 2 or more candidate clinical trials. From the first 2 studies, the authors found that users with low health literacy have difficulty forming queries using keywords and have significantly more difficulty using a standard web-based clinical trial search tool compared with users with adequate health literacy. From the third study, the authors identified the search factors most important to individuals searching for clinical trials and how these varied by health literacy level.
Document Clustering Approach for Meta Search Engine
NASA Astrophysics Data System (ADS)
Kumar, Naresh, Dr.
2017-08-01
The size of WWW is growing exponentially with ever change in technology. This results in huge amount of information with long list of URLs. Manually it is not possible to visit each page individually. So, if the page ranking algorithms are used properly then user search space can be restricted up to some pages of searched results. But available literatures show that no single search system can provide qualitative results from all the domains. This paper provides solution to this problem by introducing a new meta search engine that determine the relevancy of query corresponding to web page and cluster the results accordingly. The proposed approach reduces the user efforts, improves the quality of results and performance of the meta search engine.
Defining and Exposing Privacy Issues with Social Media
2012-06-11
Twitter, and Linked In[ I 0). VI. SEARCH ENGINES In addition to social networking sites, search engines pose new issues to privacy. As...networking, search engines , and storing personal information online in general have been accepted worldwide due to the benefits they provide. Social...networking provides even more communication in an information-demanding age, allowing users to interact across great distances. Search engines allow
Information Discovery and Retrieval Tools
2004-12-01
information. This session will focus on the various Internet search engines , directories, and how to improve the user experience through the use of...such techniques as metadata, meta- search engines , subject specific search tools, and other developing technologies.
Information Discovery and Retrieval Tools
2003-04-01
information. This session will focus on the various Internet search engines , directories, and how to improve the user experience through the use of...such techniques as metadata, meta- search engines , subject specific search tools, and other developing technologies.
The Impact of Subject Indexes on Semantic Indeterminacy in Enterprise Document Retrieval
ERIC Educational Resources Information Center
Schymik, Gregory
2012-01-01
Ample evidence exists to support the conclusion that enterprise search is failing its users. This failure is costing corporate America billions of dollars every year. Most enterprise search engines are built using web search engines as their foundations. These search engines are optimized for web use and are inadequate when used inside the…
The effective use of search engines on the Internet.
Younger, P
This article explains how nurses can get the most out of researching information on the internet using the search engine Google. It also explores some of the other types of search engines that are available. Internet users are shown how to find text, images and reports and search within sites. Copyright issues are also discussed.
Interactive Information Organization: Techniques and Evaluation
2001-05-01
information search and access. Locating interesting information on the World Wide Web is the main task of on-line search engines . Such engines accept a...likelihood of being relevant to the user’s request. The majority of today’s Web search engines follow this scenario. The ordering of documents in the
Searching for cancer information on the internet: analyzing natural language search queries.
Bader, Judith L; Theofanos, Mary Frances
2003-12-11
Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience.
Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries
Theofanos, Mary Frances
2003-01-01
Background Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. Objective To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Conclusions Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience. PMID:14713659
Directing the public to evidence-based online content.
Cooper, Crystale Purvis; Gelb, Cynthia A; Vaughn, Alexandra N; Smuland, Jenny; Hughes, Alexandra G; Hawkins, Nikki A
2015-04-01
To direct online users searching for gynecologic cancer information to accurate content, the Centers for Disease Control and Prevention's (CDC) 'Inside Knowledge: Get the Facts About Gynecologic Cancer' campaign sponsored search engine advertisements in English and Spanish. From June 2012 to August 2013, advertisements appeared when US Google users entered search terms related to gynecologic cancer. Users who clicked on the advertisements were directed to relevant content on the CDC website. Compared with the 3 months before the initiative (March-May 2012), visits to the CDC web pages linked to the advertisements were 26 times higher after the initiative began (June-August 2012) (p<0.01), and 65 times higher when the search engine advertisements were supplemented with promotion on television and additional websites (September 2012-August 2013) (p<0.01). Search engine advertisements can direct users to evidence-based content at a highly teachable moment--when they are seeking relevant information. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Collaborative search in electronic health records.
Zheng, Kai; Mei, Qiaozhu; Hanauer, David A
2011-05-01
A full-text search engine can be a useful tool for augmenting the reuse value of unstructured narrative data stored in electronic health records (EHR). A prominent barrier to the effective utilization of such tools originates from users' lack of search expertise and/or medical-domain knowledge. To mitigate the issue, the authors experimented with a 'collaborative search' feature through a homegrown EHR search engine that allows users to preserve their search knowledge and share it with others. This feature was inspired by the success of many social information-foraging techniques used on the web that leverage users' collective wisdom to improve the quality and efficiency of information retrieval. The authors conducted an empirical evaluation study over a 4-year period. The user sample consisted of 451 academic researchers, medical practitioners, and hospital administrators. The data were analyzed using a social-network analysis to delineate the structure of the user collaboration networks that mediated the diffusion of knowledge of search. The users embraced the concept with considerable enthusiasm. About half of the EHR searches processed by the system (0.44 million) were based on stored search knowledge; 0.16 million utilized shared knowledge made available by other users. The social-network analysis results also suggest that the user-collaboration networks engendered by the collaborative search feature played an instrumental role in enabling the transfer of search knowledge across people and domains. Applying collaborative search, a social information-foraging technique popularly used on the web, may provide the potential to improve the quality and efficiency of information retrieval in healthcare.
Context-Aware Online Commercial Intention Detection
NASA Astrophysics Data System (ADS)
Hu, Derek Hao; Shen, Dou; Sun, Jian-Tao; Yang, Qiang; Chen, Zheng
With more and more commercial activities moving onto the Internet, people tend to purchase what they need through Internet or conduct some online research before the actual transactions happen. For many Web users, their online commercial activities start from submitting a search query to search engines. Just like the common Web search queries, the queries with commercial intention are usually very short. Recognizing the queries with commercial intention against the common queries will help search engines provide proper search results and advertisements, help Web users obtain the right information they desire and help the advertisers benefit from the potential transactions. However, the intentions behind a query vary a lot for users with different background and interest. The intentions can even be different for the same user, when the query is issued in different contexts. In this paper, we present a new algorithm framework based on skip-chain conditional random field (SCCRF) for automatically classifying Web queries according to context-based online commercial intention. We analyze our algorithm performance both theoretically and empirically. Extensive experiments on several real search engine log datasets show that our algorithm can improve more than 10% on F1 score than previous algorithms on commercial intention detection.
What Friends Are For: Collaborative Intelligence Analysis and Search
2014-06-01
14. SUBJECT TERMS Intelligence Community, information retrieval, recommender systems , search engines, social networks, user profiling, Lucene...improvements over existing search systems . The improvements are shown to be robust to high levels of human error and low similarity between users ...precision NOLH nearly orthogonal Latin hypercubes P@ precision at documents RS recommender systems TREC Text REtrieval Conference USM user
A fuzzy-match search engine for physician directories.
Rastegar-Mojarad, Majid; Kadolph, Christopher; Ye, Zhan; Wall, Daniel; Murali, Narayana; Lin, Simon
2014-11-04
A search engine to find physicians' information is a basic but crucial function of a health care provider's website. Inefficient search engines, which return no results or incorrect results, can lead to patient frustration and potential customer loss. A search engine that can handle misspellings and spelling variations of names is needed, as the United States (US) has culturally, racially, and ethnically diverse names. The Marshfield Clinic website provides a search engine for users to search for physicians' names. The current search engine provides an auto-completion function, but it requires an exact match. We observed that 26% of all searches yielded no results. The goal was to design a fuzzy-match algorithm to aid users in finding physicians easier and faster. Instead of an exact match search, we used a fuzzy algorithm to find similar matches for searched terms. In the algorithm, we solved three types of search engine failures: "Typographic", "Phonetic spelling variation", and "Nickname". To solve these mismatches, we used a customized Levenshtein distance calculation that incorporated Soundex coding and a lookup table of nicknames derived from US census data. Using the "Challenge Data Set of Marshfield Physician Names," we evaluated the accuracy of fuzzy-match engine-top ten (90%) and compared it with exact match (0%), Soundex (24%), Levenshtein distance (59%), and fuzzy-match engine-top one (71%). We designed, created a reference implementation, and evaluated a fuzzy-match search engine for physician directories. The open-source code is available at the codeplex website and a reference implementation is available for demonstration at the datamarsh website.
New Architectures for Presenting Search Results Based on Web Search Engines Users Experience
ERIC Educational Resources Information Center
Martinez, F. J.; Pastor, J. A.; Rodriguez, J. V.; Lopez, Rosana; Rodriguez, J. V., Jr.
2011-01-01
Introduction: The Internet is a dynamic environment which is continuously being updated. Search engines have been, currently are and in all probability will continue to be the most popular systems in this information cosmos. Method: In this work, special attention has been paid to the series of changes made to search engines up to this point,…
Health search engine with e-document analysis for reliable search results.
Gaudinat, Arnaud; Ruch, Patrick; Joubert, Michel; Uziel, Philippe; Strauss, Anne; Thonnet, Michèle; Baud, Robert; Spahni, Stéphane; Weber, Patrick; Bonal, Juan; Boyer, Celia; Fieschi, Marius; Geissbuhler, Antoine
2006-01-01
After a review of the existing practical solution available to the citizen to retrieve eHealth document, the paper describes an original specialized search engine WRAPIN. WRAPIN uses advanced cross lingual information retrieval technologies to check information quality by synthesizing medical concepts, conclusions and references contained in the health literature, to identify accurate, relevant sources. Thanks to MeSH terminology [1] (Medical Subject Headings from the U.S. National Library of Medicine) and advanced approaches such as conclusion extraction from structured document, reformulation of the query, WRAPIN offers to the user a privileged access to navigate through multilingual documents without language or medical prerequisites. The results of an evaluation conducted on the WRAPIN prototype show that results of the WRAPIN search engine are perceived as informative 65% (59% for a general-purpose search engine), reliable and trustworthy 72% (41% for the other engine) by users. But it leaves room for improvement such as the increase of database coverage, the explanation of the original functionalities and an audience adaptability. Thanks to evaluation outcomes, WRAPIN is now in exploitation on the HON web site (http://www.healthonnet.org), free of charge. Intended to the citizen it is a good alternative to general-purpose search engines when the user looks up trustworthy health and medical information or wants to check automatically a doubtful content of a Web page.
Generating Personalized Web Search Using Semantic Context
Xu, Zheng; Chen, Hai-Yan; Yu, Jie
2015-01-01
The “one size fits the all” criticism of search engines is that when queries are submitted, the same results are returned to different users. In order to solve this problem, personalized search is proposed, since it can provide different search results based upon the preferences of users. However, existing methods concentrate more on the long-term and independent user profile, and thus reduce the effectiveness of personalized search. In this paper, the method captures the user context to provide accurate preferences of users for effectively personalized search. First, the short-term query context is generated to identify related concepts of the query. Second, the user context is generated based on the click through data of users. Finally, a forgetting factor is introduced to merge the independent user context in a user session, which maintains the evolution of user preferences. Experimental results fully confirm that our approach can successfully represent user context according to individual user information needs. PMID:26000335
Collaborative search in electronic health records
Mei, Qiaozhu; Hanauer, David A
2011-01-01
Objective A full-text search engine can be a useful tool for augmenting the reuse value of unstructured narrative data stored in electronic health records (EHR). A prominent barrier to the effective utilization of such tools originates from users' lack of search expertise and/or medical-domain knowledge. To mitigate the issue, the authors experimented with a ‘collaborative search’ feature through a homegrown EHR search engine that allows users to preserve their search knowledge and share it with others. This feature was inspired by the success of many social information-foraging techniques used on the web that leverage users' collective wisdom to improve the quality and efficiency of information retrieval. Design The authors conducted an empirical evaluation study over a 4-year period. The user sample consisted of 451 academic researchers, medical practitioners, and hospital administrators. The data were analyzed using a social-network analysis to delineate the structure of the user collaboration networks that mediated the diffusion of knowledge of search. Results The users embraced the concept with considerable enthusiasm. About half of the EHR searches processed by the system (0.44 million) were based on stored search knowledge; 0.16 million utilized shared knowledge made available by other users. The social-network analysis results also suggest that the user-collaboration networks engendered by the collaborative search feature played an instrumental role in enabling the transfer of search knowledge across people and domains. Conclusion Applying collaborative search, a social information-foraging technique popularly used on the web, may provide the potential to improve the quality and efficiency of information retrieval in healthcare. PMID:21486887
MetaSpider: Meta-Searching and Categorization on the Web.
ERIC Educational Resources Information Center
Chen, Hsinchun; Fan, Haiyan; Chau, Michael; Zeng, Daniel
2001-01-01
Discusses the difficulty of locating relevant information on the Web and studies two approaches to addressing the low precision and poor presentation of search results: meta-search and document categorization. Introduces MetaSpider, a meta-search engine, and presents results of a user evaluation study that compared three search engines.…
Making Temporal Search More Central in Spatial Data Infrastructures
NASA Astrophysics Data System (ADS)
Corti, P.; Lewis, B.
2017-10-01
A temporally enabled Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, users, and tools intended to provide an efficient and flexible way to use spatial information which includes the historical dimension. One of the key software components of an SDI is the catalogue service which is needed to discover, query, and manage the metadata. A search engine is a software system capable of supporting fast and reliable search, which may use any means necessary to get users to the resources they need quickly and efficiently. These techniques may include features such as full text search, natural language processing, weighted results, temporal search based on enrichment, visualization of patterns in distributions of results in time and space using temporal and spatial faceting, and many others. In this paper we will focus on the temporal aspects of search which include temporal enrichment using a time miner - a software engine able to search for date components within a larger block of text, the storage of time ranges in the search engine, handling historical dates, and the use of temporal histograms in the user interface to display the temporal distribution of search results.
Allam, Ahmed; Schulz, Peter Johannes; Nakamoto, Kent
2014-04-02
During the past 2 decades, the Internet has evolved to become a necessity in our daily lives. The selection and sorting algorithms of search engines exert tremendous influence over the global spread of information and other communication processes. This study is concerned with demonstrating the influence of selection and sorting/ranking criteria operating in search engines on users' knowledge, beliefs, and attitudes of websites about vaccination. In particular, it is to compare the effects of search engines that deliver websites emphasizing on the pro side of vaccination with those focusing on the con side and with normal Google as a control group. We conducted 2 online experiments using manipulated search engines. A pilot study was to verify the existence of dangerous health literacy in connection with searching and using health information on the Internet by exploring the effect of 2 manipulated search engines that yielded either pro or con vaccination sites only, with a group receiving normal Google as control. A pre-post test design was used; participants were American marketing students enrolled in a study-abroad program in Lugano, Switzerland. The second experiment manipulated the search engine by applying different ratios of con versus pro vaccination webpages displayed in the search results. Participants were recruited from Amazon's Mechanical Turk platform where it was published as a human intelligence task (HIT). Both experiments showed knowledge highest in the group offered only pro vaccination sites (Z=-2.088, P=.03; Kruskal-Wallis H test [H₅]=11.30, P=.04). They acknowledged the importance/benefits (Z=-2.326, P=.02; H5=11.34, P=.04) and effectiveness (Z=-2.230, P=.03) of vaccination more, whereas groups offered antivaccination sites only showed increased concern about effects (Z=-2.582, P=.01; H₅=16.88, P=.005) and harmful health outcomes (Z=-2.200, P=.02) of vaccination. Normal Google users perceived information quality to be positive despite a small effect on knowledge and a negative effect on their beliefs and attitudes toward vaccination and willingness to recommend the information (χ²₅=14.1, P=.01). More exposure to antivaccination websites lowered participants' knowledge (J=4783.5, z=-2.142, P=.03) increased their fear of side effects (J=6496, z=2.724, P=.006), and lowered their acknowledgment of benefits (J=4805, z=-2.067, P=.03). The selection and sorting/ranking criteria of search engines play a vital role in online health information seeking. Search engines delivering websites containing credible and evidence-based medical information impact positively Internet users seeking health information. Whereas sites retrieved by biased search engines create some opinion change in users. These effects are apparently independent of users' site credibility and evaluation judgments. Users are affected beneficially or detrimentally but are unaware, suggesting they are not consciously perceptive of indicators that steer them toward the credible sources or away from the dangerous ones. In this sense, the online health information seeker is flying blind.
Ethnography of Novices' First Use of Web Search Engines: Affective Control in Cognitive Processing.
ERIC Educational Resources Information Center
Nahl, Diane
1998-01-01
This study of 18 novice Internet users employed a structured self-report method to investigate affective and cognitive operations in the following phases of World Wide Web searching: presearch formulation, search statement formulation, search strategy, and evaluation of results. Users also rated their self-confidence as searchers and satisfaction…
BioSearch: a semantic search engine for Bio2RDF
Qiu, Honglei; Huang, Jiacheng
2017-01-01
Abstract Biomedical data are growing at an incredible pace and require substantial expertise to organize data in a manner that makes them easily findable, accessible, interoperable and reusable. Massive effort has been devoted to using Semantic Web standards and technologies to create a network of Linked Data for the life sciences, among others. However, while these data are accessible through programmatic means, effective user interfaces for non-experts to SPARQL endpoints are few and far between. Contributing to user frustrations is that data are not necessarily described using common vocabularies, thereby making it difficult to aggregate results, especially when distributed across multiple SPARQL endpoints. We propose BioSearch — a semantic search engine that uses ontologies to enhance federated query construction and organize search results. BioSearch also features a simplified query interface that allows users to optionally filter their keywords according to classes, properties and datasets. User evaluation demonstrated that BioSearch is more effective and usable than two state of the art search and browsing solutions. Database URL: http://ws.nju.edu.cn/biosearch/ PMID:29220451
A Method for Search Engine Selection using Thesaurus for Selective Meta-Search Engine
NASA Astrophysics Data System (ADS)
Goto, Shoji; Ozono, Tadachika; Shintani, Toramatsu
In this paper, we propose a new method for selecting search engines on WWW for selective meta-search engine. In selective meta-search engine, a method is needed that would enable selecting appropriate search engines for users' queries. Most existing methods use statistical data such as document frequency. These methods may select inappropriate search engines if a query contains polysemous words. In this paper, we describe an search engine selection method based on thesaurus. In our method, a thesaurus is constructed from documents in a search engine and is used as a source description of the search engine. The form of a particular thesaurus depends on the documents used for its construction. Our method enables search engine selection by considering relationship between terms and overcomes the problems caused by polysemous words. Further, our method does not have a centralized broker maintaining data, such as document frequency for all search engines. As a result, it is easy to add a new search engine, and meta-search engines become more scalable with our method compared to other existing methods.
Evidence-based Medicine Search: a customizable federated search engine.
Bracke, Paul J; Howse, David K; Keim, Samuel M
2008-04-01
This paper reports on the development of a tool by the Arizona Health Sciences Library (AHSL) for searching clinical evidence that can be customized for different user groups. The AHSL provides services to the University of Arizona's (UA's) health sciences programs and to the University Medical Center. Librarians at AHSL collaborated with UA College of Medicine faculty to create an innovative search engine, Evidence-based Medicine (EBM) Search, that provides users with a simple search interface to EBM resources and presents results organized according to an evidence pyramid. EBM Search was developed with a web-based configuration component that allows the tool to be customized for different specialties. Informal and anecdotal feedback from physicians indicates that EBM Search is a useful tool with potential in teaching evidence-based decision making. While formal evaluation is still being planned, a tool such as EBM Search, which can be configured for specific user populations, may help lower barriers to information resources in an academic health sciences center.
Evidence-based Medicine Search: a customizable federated search engine
Bracke, Paul J.; Howse, David K.; Keim, Samuel M.
2008-01-01
Purpose: This paper reports on the development of a tool by the Arizona Health Sciences Library (AHSL) for searching clinical evidence that can be customized for different user groups. Brief Description: The AHSL provides services to the University of Arizona's (UA's) health sciences programs and to the University Medical Center. Librarians at AHSL collaborated with UA College of Medicine faculty to create an innovative search engine, Evidence-based Medicine (EBM) Search, that provides users with a simple search interface to EBM resources and presents results organized according to an evidence pyramid. EBM Search was developed with a web-based configuration component that allows the tool to be customized for different specialties. Outcomes/Conclusion: Informal and anecdotal feedback from physicians indicates that EBM Search is a useful tool with potential in teaching evidence-based decision making. While formal evaluation is still being planned, a tool such as EBM Search, which can be configured for specific user populations, may help lower barriers to information resources in an academic health sciences center. PMID:18379665
Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai
2017-03-01
The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.
IdentiPy: An Extensible Search Engine for Protein Identification in Shotgun Proteomics.
Levitsky, Lev I; Ivanov, Mark V; Lobas, Anna A; Bubis, Julia A; Tarasova, Irina A; Solovyeva, Elizaveta M; Pridatchenko, Marina L; Gorshkov, Mikhail V
2018-06-18
We present an open-source, extensible search engine for shotgun proteomics. Implemented in Python programming language, IdentiPy shows competitive processing speed and sensitivity compared with the state-of-the-art search engines. It is equipped with a user-friendly web interface, IdentiPy Server, enabling the use of a single server installation accessed from multiple workstations. Using a simplified version of X!Tandem scoring algorithm and its novel "autotune" feature, IdentiPy outperforms the popular alternatives on high-resolution data sets. Autotune adjusts the search parameters for the particular data set, resulting in improved search efficiency and simplifying the user experience. IdentiPy with the autotune feature shows higher sensitivity compared with the evaluated search engines. IdentiPy Server has built-in postprocessing and protein inference procedures and provides graphic visualization of the statistical properties of the data set and the search results. It is open-source and can be freely extended to use third-party scoring functions or processing algorithms and allows customization of the search workflow for specialized applications.
A Full-Text-Based Search Engine for Finding Highly Matched Documents Across Multiple Categories
NASA Technical Reports Server (NTRS)
Nguyen, Hung D.; Steele, Gynelle C.
2016-01-01
This report demonstrates the full-text-based search engine that works on any Web-based mobile application. The engine has the capability to search databases across multiple categories based on a user's queries and identify the most relevant or similar. The search results presented here were found using an Android (Google Co.) mobile device; however, it is also compatible with other mobile phones.
Predicting user click behaviour in search engine advertisements
NASA Astrophysics Data System (ADS)
Daryaie Zanjani, Mohammad; Khadivi, Shahram
2015-10-01
According to the specific requirements and interests of users, search engines select and display advertisements that match user needs and have higher probability of attracting users' attention based on their previous search history. New objects such as user, advertisement or query cause a deterioration of precision in targeted advertising due to their lack of history. This article surveys this challenge. In the case of new objects, we first extract similar observed objects to the new object and then we use their history as the history of new object. Similarity between objects is measured based on correlation, which is a relation between user and advertisement when the advertisement is displayed to the user. This method is used for all objects, so it has helped us to accurately select relevant advertisements for users' queries. In our proposed model, we assume that similar users behave in a similar manner. We find that users with few queries are similar to new users. We will show that correlation between users and advertisements' keywords is high. Thus, users who pay attention to advertisements' keywords, click similar advertisements. In addition, users who pay attention to specific brand names might have similar behaviours too.
E-Referencer: Transforming Boolean OPACs to Web Search Engines.
ERIC Educational Resources Information Center
Khoo, Christopher S. G.; Poo, Danny C. C.; Toh, Teck-Kang; Hong, Glenn
E-Referencer is an expert intermediary system for searching library online public access catalogs (OPACs) on the World Wide Web. It is implemented as a proxy server that mediates the interaction between the user and Boolean OPACs. It transforms a Boolean OPAC into a retrieval system with many of the search capabilities of Web search engines.…
NASA Astrophysics Data System (ADS)
Chandrashekar, Varsha; B, Prabadevi
2017-11-01
Providing services to user is the main functionality of every search engine. Recently services based on users’ current location has also been enabled with the help of GPS in every smartphone. But how safe are their searches and how trustworthy is the search engine. Why are users tracked even when they turn off the tracking. Where lies the solution. Unless there is a security system to prevent ad trackers from misusing user’ s location, any application which relies on user’ s location will be of no use. We know that location information is highly sensitive personal data. Knowing where a person was at a particular time, one can infer his/her personal activities, political views, health status, and launch unsolicited advertising, physical attacks or harassment. Therefore, mechanisms to preserve users' privacy and anonymity are mandatory in any application that involves users’ location. So there comes the need to hide the location of the users. This proposed application aims to implement some of the features required for preserving users’ privacy and also a secure user login so that services provided to users can be used by them without danger of their searches being misused.
DOE Office of Scientific and Technical Information (OSTI.GOV)
None Available
To make the web work better for science, OSTI has developed state-of-the-art technologies and services including a deep web search capability. The deep web includes content in searchable databases available to web users but not accessible by popular search engines, such as Google. This video provides an introduction to the deep web search engine.
A Search Engine Features Comparison.
ERIC Educational Resources Information Center
Vorndran, Gerald
Until recently, the World Wide Web (WWW) public access search engines have not included many of the advanced commands, options, and features commonly available with the for-profit online database user interfaces, such as DIALOG. This study evaluates the features and characteristics common to both types of search interfaces, examines the Web search…
Features: Real-Time Adaptive Feature and Document Learning for Web Search.
ERIC Educational Resources Information Center
Chen, Zhixiang; Meng, Xiannong; Fowler, Richard H.; Zhu, Binhai
2001-01-01
Describes Features, an intelligent Web search engine that is able to perform real-time adaptive feature (i.e., keyword) and document learning. Explains how Features learns from users' document relevance feedback and automatically extracts and suggests indexing keywords relevant to a search query, and learns from users' keyword relevance feedback…
Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.
De-Arteaga, Maria; Eggel, Ivan; Kahn, Charles E; Müller, Henning
2015-10-01
Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.
NASA Astrophysics Data System (ADS)
Jiang, Y.
2015-12-01
Oceanographic resource discovery is a critical step for developing ocean science applications. With the increasing number of resources available online, many Spatial Data Infrastructure (SDI) components (e.g. catalogues and portals) have been developed to help manage and discover oceanographic resources. However, efficient and accurate resource discovery is still a big challenge because of the lack of data relevancy information. In this article, we propose a search engine framework for mining and utilizing dataset relevancy from oceanographic dataset metadata, usage metrics, and user feedback. The objective is to improve discovery accuracy of oceanographic data and reduce time for scientist to discover, download and reformat data for their projects. Experiments and a search example show that the propose engine helps both scientists and general users search for more accurate results with enhanced performance and user experience through a user-friendly interface.
Seyfried, Lisa; Hanauer, David A; Nease, Donald; Albeiruti, Rashad; Kavanagh, Janet; Kales, Helen C
2009-12-01
Electronic medical records (EMRs) have become part of daily practice for many physicians. Attempts have been made to apply electronic search engine technology to speed EMR review. This was a prospective, observational study to compare the speed and clinical accuracy of a medical record search engine vs. manual review of the EMR. Three raters reviewed 49 cases in the EMR to screen for eligibility in a depression study using the electronic medical record search engine (EMERSE). One week later raters received a scrambled set of the same patients including 9 distractor cases, and used manual EMR review to determine eligibility. For both methods, accuracy was assessed for the original 49 cases by comparison with a gold standard rater. Use of EMERSE resulted in considerable time savings; chart reviews using EMERSE were significantly faster than traditional manual review (p=0.03). The percent agreement of raters with the gold standard (e.g. concurrent validity) using either EMERSE or manual review was not significantly different. Using a search engine optimized for finding clinical information in the free-text sections of the EMR can provide significant time savings while preserving clinical accuracy. The major power of this search engine is not from a more advanced and sophisticated search algorithm, but rather from a user interface designed explicitly to help users search the entire medical record in a way that protects health information.
Seyfried, Lisa; Hanauer, David; Nease, Donald; Albeiruti, Rashad; Kavanagh, Janet; Kales, Helen C.
2009-01-01
Purpose Electronic medical records (EMR) have become part of daily practice for many physicians. Attempts have been made to apply electronic search engine technology to speed EMR review. This was a prospective, observational study to compare the speed and accuracy of electronic search engine vs. manual review of the EMR. Methods Three raters reviewed 49 cases in the EMR to screen for eligibility in a depression study using the electronic search engine (EMERSE). One week later raters received a scrambled set of the same patients including 9 distractor cases, and used manual EMR review to determine eligibility. For both methods, accuracy was assessed for the original 49 cases by comparison with a gold standard rater. Results Use of EMERSE resulted in considerable time savings; chart reviews using EMERSE were significantly faster than traditional manual review (p=0.03). The percent agreement of raters with the gold standard (e.g. concurrent validity) using either EMERSE or manual review was not significantly different. Conclusions Using a search engine optimized for finding clinical information in the free-text sections of the EMR can provide significant time savings while preserving reliability. The major power of this search engine is not from a more advanced and sophisticated search algorithm, but rather from a user interface designed explicitly to help users search the entire medical record in a way that protects health information. PMID:19560962
Building a better search engine for earth science data
NASA Astrophysics Data System (ADS)
Armstrong, E. M.; Yang, C. P.; Moroni, D. F.; McGibbney, L. J.; Jiang, Y.; Huang, T.; Greguska, F. R., III; Li, Y.; Finch, C. J.
2017-12-01
Free text data searching of earth science datasets has been implemented with varying degrees of success and completeness across the spectrum of the 12 NASA earth sciences data centers. At the JPL Physical Oceanography Distributed Active Archive Center (PO.DAAC) the search engine has been developed around the Solr/Lucene platform. Others have chosen other popular enterprise search platforms like Elasticsearch. Regardless, the default implementations of these search engines leveraging factors such as dataset popularity, term frequency and inverse document term frequency do not fully meet the needs of precise relevancy and ranking of earth science search results. For the PO.DAAC, this shortcoming has been identified for several years by its external User Working Group that has assigned several recommendations to improve the relevancy and discoverability of datasets related to remotely sensed sea surface temperature, ocean wind, waves, salinity, height and gravity that comprise a total count of over 500 public availability datasets. Recently, the PO.DAAC has teamed with an effort led by George Mason University to improve the improve the search and relevancy ranking of oceanographic data via a simple search interface and powerful backend services called MUDROD (Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to Improve Data Discovery) funded by the NASA AIST program. MUDROD has mined and utilized the combination of PO.DAAC earth science dataset metadata, usage metrics, and user feedback and search history to objectively extract relevance for improved data discovery and access. In addition to improved dataset relevance and ranking, the MUDROD search engine also returns recommendations to related datasets and related user queries. This presentation will report on use cases that drove the architecture and development, and the success metrics and improvements on search precision and recall that MUDROD has demonstrated over the existing PO.DAAC search interfaces.
From the Director: Surfing the Web for Health Information
... Reliable Results Most Internet users first visit a search engine — like Google or Yahoo! — when seeking health information. ... medical terms like "cancer" or "diabetes" into a search engine, the top-ten results will likely include authoritative ...
Information Retrieval for Education: Making Search Engines Language Aware
ERIC Educational Resources Information Center
Ott, Niels; Meurers, Detmar
2010-01-01
Search engines have been a major factor in making the web the successful and widely used information source it is today. Generally speaking, they make it possible to retrieve web pages on a topic specified by the keywords entered by the user. Yet web searching currently does not take into account which of the search results are comprehensible for…
Users' Perceptions of the Web As Revealed by Transaction Log Analysis.
ERIC Educational Resources Information Center
Moukdad, Haidar; Large, Andrew
2001-01-01
Describes the results of a transaction log analysis of a Web search engine, WebCrawler, to analyze user's queries for information retrieval. Results suggest most users do not employ advanced search features, and the linguistic structure often resembles a human-human communication model that is not always successful in human-computer communication.…
None Available
2018-02-06
To make the web work better for science, OSTI has developed state-of-the-art technologies and services including a deep web search capability. The deep web includes content in searchable databases available to web users but not accessible by popular search engines, such as Google. This video provides an introduction to the deep web search engine.
2006-12-01
speed of search engines improves the efficiency of such methods, effectiveness is not improved. The objective of this thesis is to construct and test...interest, users are assisted in finding a relevant set of key terms that will aid the search engines in narrowing, widening, or refocusing a Web search
Development and Evaluation of Thesauri-Based Bibliographic Biomedical Search Engine
ERIC Educational Resources Information Center
Alghoson, Abdullah
2017-01-01
Due to the large volume and exponential growth of biomedical documents (e.g., books, journal articles), it has become increasingly challenging for biomedical search engines to retrieve relevant documents based on users' search queries. Part of the challenge is the matching mechanism of free-text indexing that performs matching based on…
Modeling User Behavior and Attention in Search
ERIC Educational Resources Information Center
Huang, Jeff
2013-01-01
In Web search, query and click log data are easy to collect but they fail to capture user behaviors that do not lead to clicks. As search engines reach the limits inherent in click data and are hungry for more data in a competitive environment, mining cursor movements, hovering, and scrolling becomes important. This dissertation investigates how…
GGRNA: an ultrafast, transcript-oriented search engine for genes and transcripts
Naito, Yuki; Bono, Hidemasa
2012-01-01
GGRNA (http://GGRNA.dbcls.jp/) is a Google-like, ultrafast search engine for genes and transcripts. The web server accepts arbitrary words and phrases, such as gene names, IDs, gene descriptions, annotations of gene and even nucleotide/amino acid sequences through one simple search box, and quickly returns relevant RefSeq transcripts. A typical search takes just a few seconds, which dramatically enhances the usability of routine searching. In particular, GGRNA can search sequences as short as 10 nt or 4 amino acids, which cannot be handled easily by popular sequence analysis tools. Nucleotide sequences can be searched allowing up to three mismatches, or the query sequences may contain degenerate nucleotide codes (e.g. N, R, Y, S). Furthermore, Gene Ontology annotations, Enzyme Commission numbers and probe sequences of catalog microarrays are also incorporated into GGRNA, which may help users to conduct searches by various types of keywords. GGRNA web server will provide a simple and powerful interface for finding genes and transcripts for a wide range of users. All services at GGRNA are provided free of charge to all users. PMID:22641850
GGRNA: an ultrafast, transcript-oriented search engine for genes and transcripts.
Naito, Yuki; Bono, Hidemasa
2012-07-01
GGRNA (http://GGRNA.dbcls.jp/) is a Google-like, ultrafast search engine for genes and transcripts. The web server accepts arbitrary words and phrases, such as gene names, IDs, gene descriptions, annotations of gene and even nucleotide/amino acid sequences through one simple search box, and quickly returns relevant RefSeq transcripts. A typical search takes just a few seconds, which dramatically enhances the usability of routine searching. In particular, GGRNA can search sequences as short as 10 nt or 4 amino acids, which cannot be handled easily by popular sequence analysis tools. Nucleotide sequences can be searched allowing up to three mismatches, or the query sequences may contain degenerate nucleotide codes (e.g. N, R, Y, S). Furthermore, Gene Ontology annotations, Enzyme Commission numbers and probe sequences of catalog microarrays are also incorporated into GGRNA, which may help users to conduct searches by various types of keywords. GGRNA web server will provide a simple and powerful interface for finding genes and transcripts for a wide range of users. All services at GGRNA are provided free of charge to all users.
NASA Astrophysics Data System (ADS)
Piasecki, M.; Beran, B.
2007-12-01
Search engines have changed the way we see the Internet. The ability to find the information by just typing in keywords was a big contribution to the overall web experience. While the conventional search engine methodology worked well for textual documents, locating scientific data remains a problem since they are stored in databases not readily accessible by search engine bots. Considering different temporal, spatial and thematic coverage of different databases, especially for interdisciplinary research it is typically necessary to work with multiple data sources. These sources can be federal agencies which generally offer national coverage or regional sources which cover a smaller area with higher detail. However for a given geographic area of interest there often exists more than one database with relevant data. Thus being able to query multiple databases simultaneously is a desirable feature that would be tremendously useful for scientists. Development of such a search engine requires dealing with various heterogeneity issues. In scientific databases, systems often impose controlled vocabularies which ensure that they are generally homogeneous within themselves but are semantically heterogeneous when moving between different databases. This defines the boundaries of possible semantic related problems making it easier to solve than with the conventional search engines that deal with free text. We have developed a search engine that enables querying multiple data sources simultaneously and returns data in a standardized output despite the aforementioned heterogeneity issues between the underlying systems. This application relies mainly on metadata catalogs or indexing databases, ontologies and webservices with virtual globe and AJAX technologies for the graphical user interface. Users can trigger a search of dozens of different parameters over hundreds of thousands of stations from multiple agencies by providing a keyword, a spatial extent, i.e. a bounding box, and a temporal bracket. As part of this development we have also added an environment that allows users to do some of the semantic tagging, i.e. the linkage of a variable name (which can be anything they desire) to defined concepts in the ontology structure which in turn provides the backbone of the search engine.
Forsetlund, Louise; Kirkehei, Ingvild; Harboe, Ingrid; Odgaard-Jensen, Jan
2012-01-01
This study aims to compare two different search methods for determining the scope of a requested systematic review or health technology assessment. The first method (called the Direct Search Method) included performing direct searches in the Cochrane Database of Systematic Reviews (CDSR), Database of Abstracts of Reviews of Effects (DARE) and the Health Technology Assessments (HTA). Using the comparison method (called the NHS Search Engine) we performed searches by means of the search engine of the British National Health Service, NHS Evidence. We used an adapted cross-over design with a random allocation of fifty-five requests for systematic reviews. The main analyses were based on repeated measurements adjusted for the order in which the searches were conducted. The Direct Search Method generated on average fewer hits (48 percent [95 percent confidence interval {CI} 6 percent to 72 percent], had a higher precision (0.22 [95 percent CI, 0.13 to 0.30]) and more unique hits than when searching by means of the NHS Search Engine (50 percent [95 percent CI, 7 percent to 110 percent]). On the other hand, the Direct Search Method took longer (14.58 minutes [95 percent CI, 7.20 to 21.97]) and was perceived as somewhat less user-friendly than the NHS Search Engine (-0.60 [95 percent CI, -1.11 to -0.09]). Although the Direct Search Method had some drawbacks such as being more time-consuming and less user-friendly, it generated more unique hits than the NHS Search Engine, retrieved on average fewer references and fewer irrelevant results.
Heuristics for Relevancy Ranking of Earth Dataset Search Results
NASA Astrophysics Data System (ADS)
Lynnes, C.; Quinn, P.; Norton, J.
2016-12-01
As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.
Heuristics for Relevancy Ranking of Earth Dataset Search Results
NASA Technical Reports Server (NTRS)
Lynnes, Christopher; Quinn, Patrick; Norton, James
2016-01-01
As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.
Relevancy Ranking of Satellite Dataset Search Results
NASA Technical Reports Server (NTRS)
Lynnes, Christopher; Quinn, Patrick; Norton, James
2017-01-01
As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.
NASA Technical Reports Server (NTRS)
Albornoz, Caleb Ronald
2012-01-01
Thousands of millions of documents are stored and updated daily in the World Wide Web. Most of the information is not efficiently organized to build knowledge from the stored data. Nowadays, search engines are mainly used by users who rely on their skills to look for the information needed. This paper presents different techniques search engine users can apply in Google Search to improve the relevancy of search results. According to the Pew Research Center, the average person spends eight hours a month searching for the right information. For instance, a company that employs 1000 employees wastes $2.5 million dollars on looking for nonexistent and/or not found information. The cost is very high because decisions are made based on the information that is readily available to use. Whenever the information necessary to formulate an argument is not available or found, poor decisions may be made and mistakes will be more likely to occur. Also, the survey indicates that only 56% of Google users feel confident with their current search skills. Moreover, just 76% of the information that is available on the Internet is accurate.
Intelligent web image retrieval system
NASA Astrophysics Data System (ADS)
Hong, Sungyong; Lee, Chungwoo; Nah, Yunmook
2001-07-01
Recently, the web sites such as e-business sites and shopping mall sites deal with lots of image information. To find a specific image from these image sources, we usually use web search engines or image database engines which rely on keyword only retrievals or color based retrievals with limited search capabilities. This paper presents an intelligent web image retrieval system. We propose the system architecture, the texture and color based image classification and indexing techniques, and representation schemes of user usage patterns. The query can be given by providing keywords, by selecting one or more sample texture patterns, by assigning color values within positional color blocks, or by combining some or all of these factors. The system keeps track of user's preferences by generating user query logs and automatically add more search information to subsequent user queries. To show the usefulness of the proposed system, some experimental results showing recall and precision are also explained.
Algorithms for database-dependent search of MS/MS data.
Matthiesen, Rune
2013-01-01
The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.
Semantic interpretation of search engine resultant
NASA Astrophysics Data System (ADS)
Nasution, M. K. M.
2018-01-01
In semantic, logical language can be interpreted in various forms, but the certainty of meaning is included in the uncertainty, which directly always influences the role of technology. One results of this uncertainty applies to search engines as user interfaces with information spaces such as the Web. Therefore, the behaviour of search engine results should be interpreted with certainty through semantic formulation as interpretation. Behaviour formulation shows there are various interpretations that can be done semantically either temporary, inclusion, or repeat.
GeNemo: a search engine for web-based functional genomic data.
Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng
2016-07-08
A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
NASA Astrophysics Data System (ADS)
Accomazzi, Alberto; Kurtz, Michael J.; Henneken, Edwin; Grant, Carolyn S.; Thompson, Donna M.; Chyla, Roman; McDonald, Steven; Shaulis, Taylor J.; Blanco-Cuaresma, Sergi; Shapurian, Golnaz; Hostetler, Timothy W.; Templeton, Matthew R.; Lockhart, Kelly E.
2018-01-01
The ADS Team has been working on a new system architecture and user interface named “ADS Bumblebee” since 2015. The new system presents many advantages over the traditional ADS interface and search engine (“ADS Classic”). A new, state of the art search engine features a number of new capabilities such as full-text search, advanced citation queries, filtering of results and scalable analytics for any search results. Its services are built on a cloud computing platform which can be easily scaled to match user demand. The Bumblebee user interface is a rich javascript application which leverages the features of the search engine and integrates a number of additional visualizations such as co-author and co-citation networks which provide a hierarchical view of research groups and research topics, respectively. Displays of paper analytics provide views of the basic article metrics (citations, reads, and age). All visualizations are interactive and provide ways to further refine search results. This new search system, which has been in beta for the past three years, has now matured to the point that it provides feature and content parity with ADS Classic, and has become the recommended way to access ADS content and services. Following a successful transition to Bumblebee, the use of ADS Classic will be discouraged starting in 2018 and phased out in 2019. You can access our new interface at https://ui.adsabs.harvard.edu
Reconsidering the Rhizome: A Textual Analysis of Web Search Engines as Gatekeepers of the Internet
NASA Astrophysics Data System (ADS)
Hess, A.
Critical theorists have often drawn from Deleuze and Guattari's notion of the rhizome when discussing the potential of the Internet. While the Internet may structurally appear as a rhizome, its day-to-day usage by millions via search engines precludes experiencing the random interconnectedness and potential democratizing function. Through a textual analysis of four search engines, I argue that Web searching has grown hierarchies, or "trees," that organize data in tracts of knowledge and place users in marketing niches rather than assist in the development of new knowledge.
Optimizing Earth Data Search Ranking using Deep Learning and Real-time User Behaviour
NASA Astrophysics Data System (ADS)
Jiang, Y.; Yang, C. P.; Armstrong, E. M.; Huang, T.; Moroni, D. F.; McGibbney, L. J.; Greguska, F. R., III
2017-12-01
Finding Earth science data has been a challenging problem given both the quantity of data available and the heterogeneity of the data across a wide variety of domains. Current search engines in most geospatial data portals tend to induce end users to focus on one single data characteristic dimension (e.g., term frequency-inverse document frequency (TF-IDF) score, popularity, release date, etc.). This approach largely fails to take account of users' multidimensional preferences for geospatial data, and hence may likely result in a less than optimal user experience in discovering the most applicable dataset out of a vast range of available datasets. With users interacting with search engines, sufficient information is already hidden in the log files. Compared with explicit feedback data, information that can be derived/extracted from log files is virtually free and substantially more timely. In this dissertation, I propose an online deep learning framework that can quickly update the learning function based on real-time user clickstream data. The contributions of this framework include 1) a log processor that can ingest, process and create training data from web logs in a real-time manner; 2) a query understanding module to better interpret users' search intent using web log processing results and metadata; 3) a feature extractor that identifies ranking features representing users' multidimensional interests of geospatial data; and 4) a deep learning based ranking algorithm that can be trained incrementally using user behavior data. The search ranking results will be evaluated using precision at K and normalized discounted cumulative gain (NDCG).
Schulz, Peter Johannes; Nakamoto, Kent
2014-01-01
Background During the past 2 decades, the Internet has evolved to become a necessity in our daily lives. The selection and sorting algorithms of search engines exert tremendous influence over the global spread of information and other communication processes. Objective This study is concerned with demonstrating the influence of selection and sorting/ranking criteria operating in search engines on users’ knowledge, beliefs, and attitudes of websites about vaccination. In particular, it is to compare the effects of search engines that deliver websites emphasizing on the pro side of vaccination with those focusing on the con side and with normal Google as a control group. Method We conducted 2 online experiments using manipulated search engines. A pilot study was to verify the existence of dangerous health literacy in connection with searching and using health information on the Internet by exploring the effect of 2 manipulated search engines that yielded either pro or con vaccination sites only, with a group receiving normal Google as control. A pre-post test design was used; participants were American marketing students enrolled in a study-abroad program in Lugano, Switzerland. The second experiment manipulated the search engine by applying different ratios of con versus pro vaccination webpages displayed in the search results. Participants were recruited from Amazon’s Mechanical Turk platform where it was published as a human intelligence task (HIT). Results Both experiments showed knowledge highest in the group offered only pro vaccination sites (Z=–2.088, P=.03; Kruskal-Wallis H test [H5]=11.30, P=.04). They acknowledged the importance/benefits (Z=–2.326, P=.02; H5=11.34, P=.04) and effectiveness (Z=–2.230, P=.03) of vaccination more, whereas groups offered antivaccination sites only showed increased concern about effects (Z=–2.582, P=.01; H5=16.88, P=.005) and harmful health outcomes (Z=–2.200, P=.02) of vaccination. Normal Google users perceived information quality to be positive despite a small effect on knowledge and a negative effect on their beliefs and attitudes toward vaccination and willingness to recommend the information (χ2 5=14.1, P=.01). More exposure to antivaccination websites lowered participants’ knowledge (J=4783.5, z=−2.142, P=.03) increased their fear of side effects (J=6496, z=2.724, P=.006), and lowered their acknowledgment of benefits (J=4805, z=–2.067, P=.03). Conclusion The selection and sorting/ranking criteria of search engines play a vital role in online health information seeking. Search engines delivering websites containing credible and evidence-based medical information impact positively Internet users seeking health information. Whereas sites retrieved by biased search engines create some opinion change in users. These effects are apparently independent of users’ site credibility and evaluation judgments. Users are affected beneficially or detrimentally but are unaware, suggesting they are not consciously perceptive of indicators that steer them toward the credible sources or away from the dangerous ones. In this sense, the online health information seeker is flying blind. PMID:24694866
Web Search Studies: Multidisciplinary Perspectives on Web Search Engines
NASA Astrophysics Data System (ADS)
Zimmer, Michael
Perhaps the most significant tool of our internet age is the web search engine, providing a powerful interface for accessing the vast amount of information available on the world wide web and beyond. While still in its infancy compared to the knowledge tools that precede it - such as the dictionary or encyclopedia - the impact of web search engines on society and culture has already received considerable attention from a variety of academic disciplines and perspectives. This article aims to organize a meta-discipline of “web search studies,” centered around a nucleus of major research on web search engines from five key perspectives: technical foundations and evaluations; transaction log analyses; user studies; political, ethical, and cultural critiques; and legal and policy analyses.
EasyKSORD: A Platform of Keyword Search Over Relational Databases
NASA Astrophysics Data System (ADS)
Peng, Zhaohui; Li, Jing; Wang, Shan
Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.
Using the Turning Research Into Practice (TRIP) database: how do clinicians really search?*
Meats, Emma; Brassey, Jon; Heneghan, Carl; Glasziou, Paul
2007-01-01
Objectives: Clinicians and patients are increasingly accessing information through Internet searches. This study aimed to examine clinicians' current search behavior when using the Turning Research Into Practice (TRIP) database to examine search engine use and the ways it might be improved. Methods: A Web log analysis was undertaken of the TRIP database—a meta-search engine covering 150 health resources including MEDLINE, The Cochrane Library, and a variety of guidelines. The connectors for terms used in searches were studied, and observations were made of 9 users' search behavior when working with the TRIP database. Results: Of 620,735 searches, most used a single term, and 12% (n = 75,947) used a Boolean operator: 11% (n = 69,006) used “AND” and 0.8% (n = 4,941) used “OR.” Of the elements of a well-structured clinical question (population, intervention, comparator, and outcome), the population was most commonly used, while fewer searches included the intervention. Comparator and outcome were rarely used. Participants in the observational study were interested in learning how to formulate better searches. Conclusions: Web log analysis showed most searches used a single term and no Boolean operators. Observational study revealed users were interested in conducting efficient searches but did not always know how. Therefore, either better training or better search interfaces are required to assist users and enable more effective searching. PMID:17443248
Web Searching: A Process-Oriented Experimental Study of Three Interactive Search Paradigms.
ERIC Educational Resources Information Center
Dennis, Simon; Bruza, Peter; McArthur, Robert
2002-01-01
Compares search effectiveness when using query-based Internet search via the Google search engine, directory-based search via Yahoo, and phrase-based query reformulation-assisted search via the Hyperindex browser by means of a controlled, user-based experimental study of undergraduates at the University of Queensland. Discusses cognitive load,…
An assessment of the visibility of MeSH-indexed medical web catalogs through search engines.
Zweigenbaum, P; Darmoni, S J; Grabar, N; Douyère, M; Benichou, J
2002-01-01
Manually indexed Internet health catalogs such as CliniWeb or CISMeF provide resources for retrieving high-quality health information. Users of these quality-controlled subject gateways are most often referred to them by general search engines such as Google, AltaVista, etc. This raises several questions, among which the following: what is the relative visibility of medical Internet catalogs through search engines? This study addresses this issue by measuring and comparing the visibility of six major, MeSH-indexed health catalogs through four different search engines (AltaVista, Google, Lycos, Northern Light) in two languages (English and French). Over half a million queries were sent to the search engines; for most of these search engines, according to our measures at the time the queries were sent, the most visible catalog for English MeSH terms was CliniWeb and the most visible one for French MeSH terms was CISMeF.
Evaluating Open-Source Full-Text Search Engines for Matching ICD-10 Codes.
Jurcău, Daniel-Alexandru; Stoicu-Tivadar, Vasile
2016-01-01
This research presents the results of evaluating multiple free, open-source engines on matching ICD-10 diagnostic codes via full-text searches. The study investigates what it takes to get an accurate match when searching for a specific diagnostic code. For each code the evaluation starts by extracting the words that make up its text and continues with building full-text search queries from the combinations of these words. The queries are then run against all the ICD-10 codes until a match indicates the code in question as a match with the highest relative score. This method identifies the minimum number of words that must be provided in order for the search engines choose the desired entry. The engines analyzed include a popular Java-based full-text search engine, a lightweight engine written in JavaScript which can even execute on the user's browser, and two popular open-source relational database management systems.
Multitasking Information Seeking and Searching Processes.
ERIC Educational Resources Information Center
Spink, Amanda; Ozmutlu, H. Cenk; Ozmutlu, Seda
2002-01-01
Presents findings from four studies of the prevalence of multitasking information seeking and searching by Web (via the Excite search engine), information retrieval system (mediated online database searching), and academic library users. Highlights include human information coordinating behavior (HICB); and implications for models of information…
Where people look for online health information.
LaValley, Susan A; Kiviniemi, Marc T; Gage-Bouchard, Elizabeth A
2017-06-01
To identify health-related websites Americans are using, demographic characteristics associated with certain website type and how website type shapes users' online information seeking experiences. Data from the Health Information National Trends Survey 4 Cycle 1 were used. User-identified websites were categorised into four types: government sponsored, commercially based, academically affiliated and search engines. Logistic regression analyses examined associations between users' sociodemographic characteristics and website type, and associations between website type and information search experience. Respondents reported using: commercial websites (71.8%), followed by a search engines (11.6%), academically affiliated sites (11.1%) and government-sponsored websites (5.5%). Older age was associated with the use of academic websites (OR 1.03, 95% CI 1.02, 1.04); younger age with commercial website use (OR 0.97, 95% CI 0.95, 0.98). Search engine use predicted increased levels of frustration, effort and concern over website information quality, while commercial website use predicted decreased levels of these same measures. Health information seekers experience varying levels of frustration, effort and concern related to their online searching. There is a need for continued efforts by librarians and health care professionals to train seekers of online health information to select websites using established guidelines and quality criteria. © 2016 Health Libraries Group.
ERIC Educational Resources Information Center
Zhang, Xiangmin; Anghelescu, Hermina G. B.; Yuan, Xiaojun
2005-01-01
Introduction: This study sought to answer three questions: 1) Would the level of domain knowledge significantly affect the user's search behaviour? 2) Would the level of domain knowledge significantly affect search effectiveness, and 3) What would be the relationship between search behaviour and search effectiveness? Method: Participants were…
Peeling the Onion: Okapi System Architecture and Software Design Issues.
ERIC Educational Resources Information Center
Jones, S.; And Others
1997-01-01
Discusses software design issues for Okapi, an information retrieval system that incorporates both search engine and user interface and supports weighted searching, relevance feedback, and query expansion. The basic search system, adjacency searching, and moving toward a distributed system are discussed. (Author/LRW)
Web information retrieval based on ontology
NASA Astrophysics Data System (ADS)
Zhang, Jian
2013-03-01
The purpose of the Information Retrieval (IR) is to find a set of documents that are relevant for a specific information need of a user. Traditional Information Retrieval model commonly used in commercial search engine is based on keyword indexing system and Boolean logic queries. One big drawback of traditional information retrieval is that they typically retrieve information without an explicitly defined domain of interest to the users so that a lot of no relevance information returns to users, which burden the user to pick up useful answer from these no relevance results. In order to tackle this issue, many semantic web information retrieval models have been proposed recently. The main advantage of Semantic Web is to enhance search mechanisms with the use of Ontology's mechanisms. In this paper, we present our approach to personalize web search engine based on ontology. In addition, key techniques are also discussed in our paper. Compared to previous research, our works concentrate on the semantic similarity and the whole process including query submission and information annotation.
Searching the Web: The Public and Their Queries.
ERIC Educational Resources Information Center
Spink, Amanda; Wolfram, Dietmar; Jansen, Major B. J.; Saracevic, Tefko
2001-01-01
Reports findings from a study of searching behavior by over 200,000 users of the Excite search engine. Analysis of over one million queries revealed most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. Concludes that Web searching by the public differs significantly from searching of…
An Improved Forensic Science Information Search.
Teitelbaum, J
2015-01-01
Although thousands of search engines and databases are available online, finding answers to specific forensic science questions can be a challenge even to experienced Internet users. Because there is no central repository for forensic science information, and because of the sheer number of disciplines under the forensic science umbrella, forensic scientists are often unable to locate material that is relevant to their needs. The author contends that using six publicly accessible search engines and databases can produce high-quality search results. The six resources are Google, PubMed, Google Scholar, Google Books, WorldCat, and the National Criminal Justice Reference Service. Carefully selected keywords and keyword combinations, designating a keyword phrase so that the search engine will search on the phrase and not individual keywords, and prompting search engines to retrieve PDF files are among the techniques discussed. Copyright © 2015 Central Police University.
Liverpool's Discovery: A University Library Applies a New Search Tool to Improve the User Experience
ERIC Educational Resources Information Center
Kenney, Brian
2011-01-01
This article features the University of Liverpool's arts and humanities library, which applies a new search tool to improve the user experience. In nearly every way imaginable, the Sydney Jones Library and the Harold Cohen Library--the university's two libraries that serve science, engineering, and medical students--support the lives of their…
The Internet as a Source of Academic Research Information: Findings of Two Pilot Studies.
ERIC Educational Resources Information Center
Kibirige, Harry M.; DePalo, Lisa
2000-01-01
Discussion of information available on the Internet focuses on two pilot studies that investigated how academic users perceive search engines and subject-oriented databases as sources of topical information. Highlights include information seeking behavior of academic users; undergraduate users; graduate users; faculty; and implications for…
Squizzato, Silvano; Park, Young Mi; Buso, Nicola; Gur, Tamer; Cowley, Andrew; Li, Weizhong; Uludag, Mahmut; Pundir, Sangya; Cham, Jennifer A; McWilliam, Hamish; Lopez, Rodrigo
2015-07-01
The European Bioinformatics Institute (EMBL-EBI-https://www.ebi.ac.uk) provides free and unrestricted access to data across all major areas of biology and biomedicine. Searching and extracting knowledge across these domains requires a fast and scalable solution that addresses the requirements of domain experts as well as casual users. We present the EBI Search engine, referred to here as 'EBI Search', an easy-to-use fast text search and indexing system with powerful data navigation and retrieval capabilities. API integration provides access to analytical tools, allowing users to further investigate the results of their search. The interconnectivity that exists between data resources at EMBL-EBI provides easy, quick and precise navigation and a better understanding of the relationship between different data types including sequences, genes, gene products, proteins, protein domains, protein families, enzymes and macromolecular structures, together with relevant life science literature. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
THE ROLE OF SEARCHING SERVICES IN AN ACQUISITIONS PROGRAM.
ERIC Educational Resources Information Center
LUECK, ANTOINETTE L.; AND OTHERS
A USER PRESENTS HIS POINT OF VIEW ON LITERATURE SEARCHING THROUGH THE MAJOR SEARCHING SERVICES IN THE OVERALL PROGRAM OF ACQUISITIONS FOR THE ENGINEERING STAFF OF THE AIR FORCE AERO PROPULSION LABORATORY. THESE MAJOR SEARCHING SERVICES INCLUDE THE DEFENSE DOCUMENTATION CENTER (DDC), THE NATIONAL AERONAUTICS AND SPACE ADMINISTRATION (NASA), THE…
ERIC Educational Resources Information Center
Rochkind, Jonathan
2007-01-01
The ability to search and receive results in more than one database through a single interface--or metasearch--is something many users want. Google Scholar--the search engine of specifically scholarly content--and library metasearch products like Ex Libris's MetaLib, Serials Solution's Central Search, WebFeat, and products based on MuseGlobal used…
ERIC Educational Resources Information Center
Griffin, Teresa; Cohen, Deb
2012-01-01
The ubiquity and familiarity of the world wide web means that students regularly turn to it as a source of information. In doing so, they "are said to rely heavily on simple search engines, such as Google to find what they want." Researchers have also investigated how students use search engines, concluding that "the young web users tended to…
Web Search Engines: Key To Locating Information for All Users or Only the Cognoscenti?
ERIC Educational Resources Information Center
Tomaiuolo, Nicholas G.; Packer, Joan G.
This paper describes a study that attempted to ascertain the degree of success that undergraduates and graduate students, with varying levels of experience using the World Wide Web and Web search engines, and without librarian instruction or intervention, had in locating relevant material on specific topics furnished by the investigators. Because…
Combinatorial Fusion Analysis for Meta Search Information Retrieval
NASA Astrophysics Data System (ADS)
Hsu, D. Frank; Taksa, Isak
Leading commercial search engines are built as single event systems. In response to a particular search query, the search engine returns a single list of ranked search results. To find more relevant results the user must frequently try several other search engines. A meta search engine was developed to enhance the process of multi-engine querying. The meta search engine queries several engines at the same time and fuses individual engine results into a single search results list. The fusion of multiple search results has been shown (mostly experimentally) to be highly effective. However, the question of why and how the fusion should be done still remains largely unanswered. In this chapter, we utilize the combinatorial fusion analysis proposed by Hsu et al. to analyze combination and fusion of multiple sources of information. A rank/score function is used in the design and analysis of our framework. The framework provides a better understanding of the fusion phenomenon in information retrieval. For example, to improve the performance of the combined multiple scoring systems, it is necessary that each of the individual scoring systems has relatively high performance and the individual scoring systems are diverse. Additionally, we illustrate various applications of the framework using two examples from the information retrieval domain.
2007-05-17
metadata formats, metadata repositories, enterprise portals and federated search engines that make data visible, available, and usable to users...and provides the metadata formats, metadata repositories, enterprise portals and federated search engines that make data visible, available, and...develop an enterprise- wide data sharing plan, establishment of mission area governance processes for CIOs, DISA development of federated search specifications
Web search queries can predict stock market volumes.
Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar
2012-01-01
We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.
Web Search Queries Can Predict Stock Market Volumes
Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar
2012-01-01
We live in a computerized and networked society where many of our actions leave a digital trace and affect other people’s actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www. PMID:22829871
An assessment of the visibility of MeSH-indexed medical web catalogs through search engines.
Zweigenbaum, P.; Darmoni, S. J.; Grabar, N.; Douyère, M.; Benichou, J.
2002-01-01
Manually indexed Internet health catalogs such as CliniWeb or CISMeF provide resources for retrieving high-quality health information. Users of these quality-controlled subject gateways are most often referred to them by general search engines such as Google, AltaVista, etc. This raises several questions, among which the following: what is the relative visibility of medical Internet catalogs through search engines? This study addresses this issue by measuring and comparing the visibility of six major, MeSH-indexed health catalogs through four different search engines (AltaVista, Google, Lycos, Northern Light) in two languages (English and French). Over half a million queries were sent to the search engines; for most of these search engines, according to our measures at the time the queries were sent, the most visible catalog for English MeSH terms was CliniWeb and the most visible one for French MeSH terms was CISMeF. PMID:12463965
A two-level cache for distributed information retrieval in search engines.
Zhang, Weizhe; He, Hui; Ye, Jianwei
2013-01-01
To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache.
A Two-Level Cache for Distributed Information Retrieval in Search Engines
Zhang, Weizhe; He, Hui; Ye, Jianwei
2013-01-01
To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache. PMID:24363621
A Question of Interface Design: How Do Online Service GUIs Measure Up?
ERIC Educational Resources Information Center
Head, Alison J.
1997-01-01
Describes recent improvements in graphical user interfaces (GUIs) offered by online services. Highlights include design considerations, including computer engineering capabilities and users' abilities; fundamental GUI design principles; user empowerment; visual communication and interaction; and an evaluation of online search interfaces. (LRW)
NASA Astrophysics Data System (ADS)
Ponomarev, Vasily
SPLDESS development with the elements of a multimedia illustration of traditional hypertext search results by Internet search engine provides research of information propagation innovative effect during the public access information-recruiting networks of information kiosks formation at the experimental stage with the mirrors at the constantly updating portal for Internet users. Author of this publication put the emphasis on a condition of pertinent search engine results of the total answer by the user inquiries, that provide the politically correct and not usurping socially-network data mining effect at urgent monitoring. Development of the access by devices of the new communication types with the newest technologies of data transmission, multimedia and an information exchange from the first innovation line usage support portal is presented also (including the device of social-psycho-linguistic determination according the author's conception).
Abbott, Kevin C; Oliver, David K; Boal, Thomas R; Gadiyak, Grigorii; Boocks, Carl; Yuan, Christina M; Welch, Paul G; Poropatich, Ronald K
2002-04-01
Studies of the use of the World Wide Web to obtain medical knowledge have largely focused on patients. In particular, neither the international use of academic nephrology World Wide Web sites (websites) as primary information sources nor the use of search engines (and search strategies) to obtain medical information have been described. Visits ("hits") to the Walter Reed Army Medical Center (WRAMC) Nephrology Service website from April 30, 2000, to March 14, 2001, were analyzed for the location of originating source using Webtrends, and search engines (Google, Lycos, etc.) were analyzed manually for search strategies used. From April 30, 2000 to March 14, 2001, the WRAMC Nephrology Service website received 1,007,103 hits and 12,175 visits. These visits were from 33 different countries, and the most frequent regions were Western Europe, Asia, Australia, the Middle East, Pacific Islands, and South America. The most frequent organization using the site was the military Internet system, followed by America Online and automated search programs of online search engines, most commonly Google. The online lecture series was the most frequently visited section of the website. Search strategies used in search engines were extremely technical. The use of "robots" by standard Internet search engines to locate websites, which may be blocked by mandatory registration, has allowed users worldwide to access the WRAMC Nephrology Service website to answer very technical questions. This suggests that it is being used as an alternative to other primary sources of medical information and that the use of mandatory registration may hinder users from finding valuable sites. With current Internet technology, even a single service can become a worldwide information resource without sacrificing its primary customers.
Yu, Hong; Kaufman, David
2007-01-01
The Internet is having a profound impact on physicians' medical decision making. One recent survey of 277 physicians showed that 72% of physicians regularly used the Internet to research medical information and 51% admitted that information from web sites influenced their clinical decisions. This paper describes the first cognitive evaluation of four state-of-the-art Internet search engines: Google (i.e., Google and Scholar.Google), MedQA, Onelook, and PubMed for answering definitional questions (i.e., questions with the format of "What is X?") posed by physicians. Onelook is a portal for online definitions, and MedQA is a question answering system that automatically generates short texts to answer specific biomedical questions. Our evaluation criteria include quality of answer, ease of use, time spent, and number of actions taken. Our results show that MedQA outperforms Onelook and PubMed in most of the criteria, and that MedQA surpasses Google in time spent and number of actions, two important efficiency criteria. Our results show that Google is the best system for quality of answer and ease of use. We conclude that Google is an effective search engine for medical definitions, and that MedQA exceeds the other search engines in that it provides users direct answers to their questions; while the users of the other search engines have to visit several sites before finding all of the pertinent information.
Design and Empirical Evaluation of Search Software for Legal Professionals on the WWW.
ERIC Educational Resources Information Center
Dempsey, Bert J.; Vreeland, Robert C.; Sumner, Robert G., Jr.; Yang, Kiduk
2000-01-01
Discussion of effective search aids for legal researchers on the World Wide Web focuses on the design and evaluation of two software systems developed to explore models for browsing and searching across a user-selected set of Web sites. Describes crawler-enhanced search engines, filters, distributed full-text searching, and natural language…
Skip to content HOME NEWS USERS OpenFrescoExpress OpenFresco Examples & Tools Feedback staff and research students learning about hybrid simulation and starting to use this experimental the Pacific Earthquake Engineering Research Center (PEER) and others. Search Search for: Search Menu
Cognitive and Task Influences on Web Searching Behavior.
ERIC Educational Resources Information Center
Kim, Kyung-Sun; Allen, Bryce
2002-01-01
Describes results from two independent investigations of college students that were conducted to study the impact of differences in users' cognition and search tasks on Web search activities and outcomes. Topics include cognitive style; problem-solving; and implications for the design and use of the Web and Web search engines. (Author/LRW)
Aesthetics, Usefulness and Performance in User--Search-Engine Interaction
ERIC Educational Resources Information Center
Katz, Adi
2010-01-01
Issues of visual appeal have become an integral part of designing interactive systems. Interface aesthetics may form users' attitudes towards computer applications and information technology. Aesthetics can affect user satisfaction, and influence their willingness to buy or adopt a system. This study follows previous studies that found that users…
2009-06-01
search engines are not up to this task, as they have been optimized to catalog information quickly and efficiently for user ease of access while promoting retail commerce at the same time. This thesis presents a performance analysis of a new search engine algorithm designed to help find IED education networks using the Nutch open-source search engine architecture. It reveals which web pages are more important via references from other web pages regardless of domain. In addition, this thesis discusses potential evaluation and monitoring techniques to be used in conjunction
Web Spam, Social Propaganda and the Evolution of Search Engine Rankings
NASA Astrophysics Data System (ADS)
Metaxas, Panagiotis Takis
Search Engines have greatly influenced the way we experience the web. Since the early days of the web, users have been relying on them to get informed and make decisions. When the web was relatively small, web directories were built and maintained using human experts to screen and categorize pages according to their characteristics. By the mid 1990's, however, it was apparent that the human expert model of categorizing web pages does not scale. The first search engines appeared and they have been evolving ever since, taking over the role that web directories used to play.
CALIL.JP, a new web service that provides one-stop searching of Japan-wide libraries' collections
NASA Astrophysics Data System (ADS)
Yoshimoto, Ryuuji
Calil.JP is a new free online service that enables federated searching, marshalling and integration of Web-OPAC data on the collections of libraries from around Japan. It offers the search results through user-friendly interface. Developed with a concept of accelerating discovery of fun-to-read books and motivating users to head for libraries, Calil was initially designed mainly for public library users. It now extends to cover university libraries and special libraries. This article presents the Calil's basic capabilities, concept, progress made thus far, and plan for further development as viewed from an engineering development manager.
New generation of the multimedia search engines
NASA Astrophysics Data System (ADS)
Mijes Cruz, Mario Humberto; Soto Aldaco, Andrea; Maldonado Cano, Luis Alejandro; López Rodríguez, Mario; Rodríguez Vázqueza, Manuel Antonio; Amaya Reyes, Laura Mariel; Cano Martínez, Elizabeth; Pérez Rosas, Osvaldo Gerardo; Rodríguez Espejo, Luis; Flores Secundino, Jesús Abimelek; Rivera Martínez, José Luis; García Vázquez, Mireya Saraí; Zamudio Fuentes, Luis Miguel; Sánchez Valenzuela, Juan Carlos; Montoya Obeso, Abraham; Ramírez Acosta, Alejandro Álvaro
2016-09-01
Current search engines are based upon search methods that involve the combination of words (text-based search); which has been efficient until now. However, the Internet's growing demand indicates that there's more diversity on it with each passing day. Text-based searches are becoming limited, as most of the information on the Internet can be found in different types of content denominated multimedia content (images, audio files, video files). Indeed, what needs to be improved in current search engines is: search content, and precision; as well as an accurate display of expected search results by the user. Any search can be more precise if it uses more text parameters, but it doesn't help improve the content or speed of the search itself. One solution is to improve them through the characterization of the content for the search in multimedia files. In this article, an analysis of the new generation multimedia search engines is presented, focusing the needs according to new technologies. Multimedia content has become a central part of the flow of information in our daily life. This reflects the necessity of having multimedia search engines, as well as knowing the real tasks that it must comply. Through this analysis, it is shown that there are not many search engines that can perform content searches. The area of research of multimedia search engines of new generation is a multidisciplinary area that's in constant growth, generating tools that satisfy the different needs of new generation systems.
NASA Astrophysics Data System (ADS)
Li, Y.; Jiang, Y.; Yang, C. P.; Armstrong, E. M.; Huang, T.; Moroni, D. F.; McGibbney, L. J.
2016-12-01
Big oceanographic data have been produced, archived and made available online, but finding the right data for scientific research and application development is still a significant challenge. A long-standing problem in data discovery is how to find the interrelationships between keywords and data, as well as the intrarelationships of the two individually. Most previous research attempted to solve this problem by building domain-specific ontology either manually or through automatic machine learning techniques. The former is costly, labor intensive and hard to keep up-to-date, while the latter is prone to noise and may be difficult for human to understand. Large-scale user behavior data modelling represents a largely untapped, unique, and valuable source for discovering semantic relationships among domain-specific vocabulary. In this article, we propose a search engine framework for mining and utilizing dataset relevancy from oceanographic dataset metadata, user behaviors, and existing ontology. The objective is to improve discovery accuracy of oceanographic data and reduce time for scientist to discover, download and reformat data for their projects. Experiments and a search example show that the proposed search engine helps both scientists and general users search with better ranking results, recommendation, and ontology navigation.
A Method for Efficient Searching at Online Shopping
NASA Astrophysics Data System (ADS)
Sanjo, Tomomi; Nagata, Moiro
In recent years, online shopping has been popularized. However, the users can not find efficiently their items at on-line markets. This paper proposes an engine to find items easily at the online market. This engine has the following facilities. First, it presents information in a fixed format. Second, the user can find items by selected keywords. Third, it presents only necessary information by using his/her history. Finally, it has a customize function for each user. Moreover, the system asks the users to down load a page of recommended items. We show the effectives of our proposal with some experiments.
RAPTOR: An Enterprise Knowledge Discovery Engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
2010-11-11
SharePoint search capability is commonly criticized by users due to the limited functionality provided. This software takes a world class search capability (Piranha) and integrates it with an Enterprise Level Collaboration Application installed on most major government and commercial sites.
Development of Health Information Search Engine Based on Metadata and Ontology
Song, Tae-Min; Jin, Dal-Lae
2014-01-01
Objectives The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Methods Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. Results A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Conclusions Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers. PMID:24872907
Development of health information search engine based on metadata and ontology.
Song, Tae-Min; Park, Hyeoun-Ae; Jin, Dal-Lae
2014-04-01
The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers.
User Selection of Purchased Information Services. Interim Technical Report (June 1975-January 1976).
ERIC Educational Resources Information Center
Hall, Homer J.
Interviews conducted in the first phase of a project to develop a method for user selection of purchased scientific and technical information services identified a number of relationship among different populations of users. Research scientists, engineers, and patent attorneys want convenient access to original data identified in the search.…
FOAMSearch.net: A custom search engine for emergency medicine and critical care.
Raine, Todd; Thoma, Brent; Chan, Teresa M; Lin, Michelle
2015-08-01
The number of online resources read by and pertinent to clinicians has increased dramatically. However, most healthcare professionals still use mainstream search engines as their primary port of entry to the resources on the Internet. These search engines use algorithms that do not make it easy to find clinician-oriented resources. FOAMSearch, a custom search engine (CSE), was developed to find relevant, high-quality online resources for emergency medicine and critical care (EMCC) clinicians. Using Google™ algorithms, it searches a vetted list of >300 blogs, podcasts, wikis, knowledge translation tools, clinical decision support tools and medical journals. Utilisation has increased progressively to >3000 users/month since its launch in 2011. Further study of the role of CSEs to find medical resources is needed, and it might be possible to develop similar CSEs for other areas of medicine. © 2015 Australasian College for Emergency Medicine and Australasian Society for Emergency Medicine.
Use of controlled vocabularies to improve biomedical information retrieval tasks.
Pasche, Emilie; Gobeill, Julien; Vishnyakova, Dina; Ruch, Patrick; Lovis, Christian
2013-01-01
The high heterogeneity of biomedical vocabulary is a major obstacle for information retrieval in large biomedical collections. Therefore, using biomedical controlled vocabularies is crucial for managing these contents. We investigate the impact of query expansion based on controlled vocabularies to improve the effectiveness of two search engines. Our strategy relies on the enrichment of users' queries with additional terms, directly derived from such vocabularies applied to infectious diseases and chemical patents. We observed that query expansion based on pathogen names resulted in improvements of the top-precision of our first search engine, while the normalization of diseases degraded the top-precision. The expansion of chemical entities, which was performed on the second search engine, positively affected the mean average precision. We have shown that query expansion of some types of biomedical entities has a great potential to improve search effectiveness; therefore a fine-tuning of query expansion strategies could help improving the performances of search engines.
'Sciencenet'--towards a global search and share engine for all scientific knowledge.
Lütjohann, Dominic S; Shah, Asmi H; Christen, Michael P; Richter, Florian; Knese, Karsten; Liebel, Urban
2011-06-15
Modern biological experiments create vast amounts of data which are geographically distributed. These datasets consist of petabytes of raw data and billions of documents. Yet to the best of our knowledge, a search engine technology that searches and cross-links all different data types in life sciences does not exist. We have developed a prototype distributed scientific search engine technology, 'Sciencenet', which facilitates rapid searching over this large data space. By 'bringing the search engine to the data', we do not require server farms. This platform also allows users to contribute to the search index and publish their large-scale data to support e-Science. Furthermore, a community-driven method guarantees that only scientific content is crawled and presented. Our peer-to-peer approach is sufficiently scalable for the science web without performance or capacity tradeoff. The free to use search portal web page and the downloadable client are accessible at: http://sciencenet.kit.edu. The web portal for index administration is implemented in ASP.NET, the 'AskMe' experiment publisher is written in Python 2.7, and the backend 'YaCy' search engine is based on Java 1.6.
Planetary Data Systems (PDS) Imaging Node Atlas II
NASA Technical Reports Server (NTRS)
Stanboli, Alice; McAuley, James M.
2013-01-01
The Planetary Image Atlas (PIA) is a Rich Internet Application (RIA) that serves planetary imaging data to the science community and the general public. PIA also utilizes the USGS Unified Planetary Coordinate system (UPC) and the on-Mars map server. The Atlas was designed to provide the ability to search and filter through greater than 8 million planetary image files. This software is a three-tier Web application that contains a search engine backend (MySQL, JAVA), Web service interface (SOAP) between server and client, and a GWT Google Maps API client front end. This application allows for the search, retrieval, and download of planetary images and associated meta-data from the following missions: 2001 Mars Odyssey, Cassini, Galileo, LCROSS, Lunar Reconnaissance Orbiter, Mars Exploration Rover, Mars Express, Magellan, Mars Global Surveyor, Mars Pathfinder, Mars Reconnaissance Orbiter, MESSENGER, Phoe nix, Viking Lander, Viking Orbiter, and Voyager. The Atlas utilizes the UPC to translate mission-specific coordinate systems into a unified coordinate system, allowing the end user to query across missions of similar targets. If desired, the end user can also use a mission-specific view of the Atlas. The mission-specific views rely on the same code base. This application is a major improvement over the initial version of the Planetary Image Atlas. It is a multi-mission search engine. This tool includes both basic and advanced search capabilities, providing a product search tool to interrogate the collection of planetary images. This tool lets the end user query information about each image, and ignores the data that the user has no interest in. Users can reduce the number of images to look at by defining an area of interest with latitude and longitude ranges.
Kushniruk, Andre W; Kan, Min-Yem; McKeown, Kathleen; Klavans, Judith; Jordan, Desmond; LaFlamme, Mark; Patel, Vimia L
2002-01-01
This paper describes the comparative evaluation of an experimental automated text summarization system, Centrifuser and three conventional search engines - Google, Yahoo and About.com. Centrifuser provides information to patients and families relevant to their questions about specific health conditions. It then produces a multidocument summary of articles retrieved by a standard search engine, tailored to the user's question. Subjects, consisting of friends or family of hospitalized patients, were asked to "think aloud" as they interacted with the four systems. The evaluation involved audio- and video recording of subject interactions with the interfaces in situ at a hospital. Results of the evaluation show that subjects found Centrifuser's summarization capability useful and easy to understand. In comparing Centrifuser to the three search engines, subjects' ratings varied; however, specific interface features were deemed useful across interfaces. We conclude with a discussion of the implications for engineering Web-based retrieval systems.
Complex dynamics of our economic life on different scales: insights from search engine query data.
Preis, Tobias; Reith, Daniel; Stanley, H Eugene
2010-12-28
Search engine query data deliver insight into the behaviour of individuals who are the smallest possible scale of our economic life. Individuals are submitting several hundred million search engine queries around the world each day. We study weekly search volume data for various search terms from 2004 to 2010 that are offered by the search engine Google for scientific use, providing information about our economic life on an aggregated collective level. We ask the question whether there is a link between search volume data and financial market fluctuations on a weekly time scale. Both collective 'swarm intelligence' of Internet users and the group of financial market participants can be regarded as a complex system of many interacting subunits that react quickly to external changes. We find clear evidence that weekly transaction volumes of S&P 500 companies are correlated with weekly search volume of corresponding company names. Furthermore, we apply a recently introduced method for quantifying complex correlations in time series with which we find a clear tendency that search volume time series and transaction volume time series show recurring patterns.
GEMINI: a computationally-efficient search engine for large gene expression datasets.
DeFreitas, Timothy; Saddiki, Hachem; Flaherty, Patrick
2016-02-24
Low-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and use that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data using a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query - a text-based string - is mismatched with the form of the target - a genomic profile. To improve access to massive genomic data resources, we have developed a fast search engine, GEMINI, that uses a genomic profile as a query to search for similar genomic profiles. GEMINI implements a nearest-neighbor search algorithm using a vantage-point tree to store a database of n profiles and in certain circumstances achieves an [Formula: see text] expected query time in the limit. We tested GEMINI on breast and ovarian cancer gene expression data from The Cancer Genome Atlas project and show that it achieves a query time that scales as the logarithm of the number of records in practice on genomic data. In a database with 10(5) samples, GEMINI identifies the nearest neighbor in 0.05 sec compared to a brute force search time of 0.6 sec. GEMINI is a fast search engine that uses a query genomic profile to search for similar profiles in a very large genomic database. It enables users to identify similar profiles independent of sample label, data origin or other meta-data information.
Developing A Web-based User Interface for Semantic Information Retrieval
NASA Technical Reports Server (NTRS)
Berrios, Daniel C.; Keller, Richard M.
2003-01-01
While there are now a number of languages and frameworks that enable computer-based systems to search stored data semantically, the optimal design for effective user interfaces for such systems is still uncle ar. Such interfaces should mask unnecessary query detail from users, yet still allow them to build queries of arbitrary complexity without significant restrictions. We developed a user interface supporting s emantic query generation for Semanticorganizer, a tool used by scient ists and engineers at NASA to construct networks of knowledge and dat a. Through this interface users can select node types, node attribute s and node links to build ad-hoc semantic queries for searching the S emanticOrganizer network.
NASA Technical Reports Server (NTRS)
Lynnes, Chris
2014-01-01
Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.
Improving Concept-Based Web Image Retrieval by Mixing Semantically Similar Greek Queries
ERIC Educational Resources Information Center
Lazarinis, Fotis
2008-01-01
Purpose: Image searching is a common activity for web users. Search engines offer image retrieval services based on textual queries. Previous studies have shown that web searching is more demanding when the search is not in English and does not use a Latin-based language. The aim of this paper is to explore the behaviour of the major search…
Institutional Repositories in the UK: What Can the Google User Find There?
ERIC Educational Resources Information Center
Markland, Margaret
2006-01-01
This study investigates the efficiency of the Google search engine at retrieving items from 26 UK Institutional Repositories, covering a wide range of subject areas. One item is chosen from each repository and four searches are carried out: two keyword searches and two full title searches, each using both Google and then Google Scholar. A further…
Usability Evaluation of NLP-PIER: A Clinical Document Search Engine for Researchers.
Hultman, Gretchen; McEwan, Reed; Pakhomov, Serguei; Lindemann, Elizabeth; Skube, Steven; Melton, Genevieve B
2017-01-01
NLP-PIER (Natural Language Processing - Patient Information Extraction for Research) is a self-service platform with a search engine for clinical researchers to perform natural language processing (NLP) queries using clinical notes. We conducted user-centered testing of NLP-PIER's usability to inform future design decisions. Quantitative and qualitative data were analyzed. Our findings will be used to improve the usability of NLP-PIER.
Dermatological image search engines on the Internet: do they work?
Cutrone, M; Grimalt, R
2007-02-01
Atlases on CD-ROM first substituted the use of paediatric dermatology atlases printed on paper. This permitted a faster search and a practical comparison of differential diagnoses. The third step in the evolution of clinical atlases was the onset of the online atlas. Many doctors now use the Internet image search engines to obtain clinical images directly. The aim of this study was to test the reliability of the image search engines compared to the online atlases. We tested seven Internet image search engines with three paediatric dermatology diseases. In general, the service offered by the search engines is good, and continues to be free of charge. The coincidence between what we searched for and what we found was generally excellent, and contained no advertisements. Most Internet search engines provided similar results but some were more user friendly than others. It is not necessary to repeat the same research with Picsearch, Lycos and MSN, as the response would be the same; there is a possibility that they might share software. Image search engines are a useful, free and precise method to obtain paediatric dermatology images for teaching purposes. There is still the matter of copyright to be resolved. What are the legal uses of these 'free' images? How do we define 'teaching purposes'? New watermark methods and encrypted electronic signatures might solve these problems and answer these questions.
A Real-Time All-Atom Structural Search Engine for Proteins
Gonzalez, Gabriel; Hannigan, Brett; DeGrado, William F.
2014-01-01
Protein designers use a wide variety of software tools for de novo design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new “designability”-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at http://www.degradolab.org/suns/ and the source code is hosted at https://github.com/godotgildor/Suns (PyMOL plugin, BSD license), https://github.com/Gabriel439/suns-cmd (command line client, BSD license), and https://github.com/Gabriel439/suns-search (search engine server, GPLv2 license). PMID:25079944
A real-time all-atom structural search engine for proteins.
Gonzalez, Gabriel; Hannigan, Brett; DeGrado, William F
2014-07-01
Protein designers use a wide variety of software tools for de novo design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new "designability"-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at http://www.degradolab.org/suns/ and the source code is hosted at https://github.com/godotgildor/Suns (PyMOL plugin, BSD license), https://github.com/Gabriel439/suns-cmd (command line client, BSD license), and https://github.com/Gabriel439/suns-search (search engine server, GPLv2 license).
An ontology-based search engine for protein-protein interactions
2010-01-01
Background Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. Results We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Conclusion Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology. PMID:20122195
An ontology-based search engine for protein-protein interactions.
Park, Byungkyu; Han, Kyungsook
2010-01-18
Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology.
A user-friendly tool for medical-related patent retrieval.
Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnyakova, Dina; Lovis, Christian; Ruch, Patrick
2012-01-01
Health-related information retrieval is complicated by the variety of nomenclatures available to name entities, since different communities of users will use different ways to name a same entity. We present in this report the development and evaluation of a user-friendly interactive Web application aiming at facilitating health-related patent search. Our tool, called TWINC, relies on a search engine tuned during several patent retrieval competitions, enhanced with intelligent interaction modules, such as chemical query, normalization and expansion. While the functionality of related article search showed promising performances, the ad hoc search results in fairly contrasted results. Nonetheless, TWINC performed well during the PatOlympics competition and was appreciated by intellectual property experts. This result should be balanced by the limited evaluation sample. We can also assume that it can be customized to be applied in corporate search environments to process domain and company-specific vocabularies, including non-English literature and patents reports.
Sauer, Ursula G; Wächter, Thomas; Hareng, Lars; Wareing, Britta; Langsch, Angelika; Zschunke, Matthias; Alvers, Michael R; Landsiedel, Robert
2014-06-01
The knowledge-based search engine Go3R, www.Go3R.org, has been developed to assist scientists from industry and regulatory authorities in collecting comprehensive toxicological information with a special focus on identifying available alternatives to animal testing. The semantic search paradigm of Go3R makes use of expert knowledge on 3Rs methods and regulatory toxicology, laid down in the ontology, a network of concepts, terms, and synonyms, to recognize the contents of documents. Search results are automatically sorted into a dynamic table of contents presented alongside the list of documents retrieved. This table of contents allows the user to quickly filter the set of documents by topics of interest. Documents containing hazard information are automatically assigned to a user interface following the endpoint-specific IUCLID5 categorization scheme required, e.g. for REACH registration dossiers. For this purpose, complex endpoint-specific search queries were compiled and integrated into the search engine (based upon a gold standard of 310 references that had been assigned manually to the different endpoint categories). Go3R sorts 87% of the references concordantly into the respective IUCLID5 categories. Currently, Go3R searches in the 22 million documents available in the PubMed and TOXNET databases. However, it can be customized to search in other databases including in-house databanks. Copyright © 2013 Elsevier Ltd. All rights reserved.
Hanauer, David A; Mei, Qiaozhu; Law, James; Khanna, Ritu; Zheng, Kai
2015-06-01
This paper describes the University of Michigan's nine-year experience in developing and using a full-text search engine designed to facilitate information retrieval (IR) from narrative documents stored in electronic health records (EHRs). The system, called the Electronic Medical Record Search Engine (EMERSE), functions similar to Google but is equipped with special functionalities for handling challenges unique to retrieving information from medical text. Key features that distinguish EMERSE from general-purpose search engines are discussed, with an emphasis on functions crucial to (1) improving medical IR performance and (2) assuring search quality and results consistency regardless of users' medical background, stage of training, or level of technical expertise. Since its initial deployment, EMERSE has been enthusiastically embraced by clinicians, administrators, and clinical and translational researchers. To date, the system has been used in supporting more than 750 research projects yielding 80 peer-reviewed publications. In several evaluation studies, EMERSE demonstrated very high levels of sensitivity and specificity in addition to greatly improved chart review efficiency. Increased availability of electronic data in healthcare does not automatically warrant increased availability of information. The success of EMERSE at our institution illustrates that free-text EHR search engines can be a valuable tool to help practitioners and researchers retrieve information from EHRs more effectively and efficiently, enabling critical tasks such as patient case synthesis and research data abstraction. EMERSE, available free of charge for academic use, represents a state-of-the-art medical IR tool with proven effectiveness and user acceptance. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Electronic Collection Management and Electronic Information Services
2004-12-01
federated search tools are still being perfected with much debate surrounding their use. Encouragingly, as the federated search tools have evolved...institutional repositories to be included in a federated search process, libraries would have to harvest the metadata from the repositories and then make...providers in Library High Tech News. At this time, federated search engines serve some user groups better than others. Undergraduate students are well
Design implications for task-specific search utilities for retrieval and re-engineering of code
NASA Astrophysics Data System (ADS)
Iqbal, Rahat; Grzywaczewski, Adam; Halloran, John; Doctor, Faiyaz; Iqbal, Kashif
2017-05-01
The importance of information retrieval systems is unquestionable in the modern society and both individuals as well as enterprises recognise the benefits of being able to find information effectively. Current code-focused information retrieval systems such as Google Code Search, Codeplex or Koders produce results based on specific keywords. However, these systems do not take into account developers' context such as development language, technology framework, goal of the project, project complexity and developer's domain expertise. They also impose additional cognitive burden on users in switching between different interfaces and clicking through to find the relevant code. Hence, they are not used by software developers. In this paper, we discuss how software engineers interact with information and general-purpose information retrieval systems (e.g. Google, Yahoo!) and investigate to what extent domain-specific search and recommendation utilities can be developed in order to support their work-related activities. In order to investigate this, we conducted a user study and found that software engineers followed many identifiable and repeatable work tasks and behaviours. These behaviours can be used to develop implicit relevance feedback-based systems based on the observed retention actions. Moreover, we discuss the implications for the development of task-specific search and collaborative recommendation utilities embedded with the Google standard search engine and Microsoft IntelliSense for retrieval and re-engineering of code. Based on implicit relevance feedback, we have implemented a prototype of the proposed collaborative recommendation system, which was evaluated in a controlled environment simulating the real-world situation of professional software engineers. The evaluation has achieved promising initial results on the precision and recall performance of the system.
Make Mine a Metasearcher, Please!
ERIC Educational Resources Information Center
Repman, Judi; Carlson, Randal D.
2000-01-01
Describes metasearch tools and explains their value in helping library media centers improve students' Web searches. Discusses Boolean queries and the emphasis on speed at the expense of comprehensiveness; and compares four metasearch tools, including the number of search engines consulted, user control, and databases included. (LRW)
Dao, Tien Tuan; Hoang, Tuan Nha; Ta, Xuan Hien; Tho, Marie Christine Ho Ba
2013-02-01
Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical processes, pathological knowledge and practical expertise. In this present work, an advanced knowledge-based personalized search engine was developed. Our search engine was based on a client-server multi-layer multi-agent architecture and the principle of semantic web services to acquire dynamically accurate and reliable HMSR information by a semantic processing and visualization approach. A security-enhanced mechanism was applied to protect the medical information. A multi-agent crawler was implemented to develop a content-based database of HMSR information. A new semantic-based PageRank score with related mathematical formulas were also defined and implemented. As the results, semantic web service descriptions were presented in OWL, WSDL and OWL-S formats. Operational scenarios with related web-based interfaces for personal computers and mobile devices were presented and analyzed. Functional comparison between our knowledge-based search engine, a conventional search engine and a semantic search engine showed the originality and the robustness of our knowledge-based personalized search engine. In fact, our knowledge-based personalized search engine allows different users such as orthopedic patient and experts or healthcare system managers or medical students to access remotely into useful, accurate, reliable and good-quality HMSR information for their learning and medical purposes. Copyright © 2012 Elsevier Inc. All rights reserved.
Patterns of Information-Seeking for Cancer on the Internet: An Analysis of Real World Data
Ofran, Yishai; Paltiel, Ora; Pelleg, Dan; Rowe, Jacob M.; Yom-Tov, Elad
2012-01-01
Although traditionally the primary information sources for cancer patients have been the treating medical team, patients and their relatives increasingly turn to the Internet, though this source may be misleading and confusing. We assess Internet searching patterns to understand the information needs of cancer patients and their acquaintances, as well as to discern their underlying psychological states. We screened 232,681 anonymous users who initiated cancer-specific queries on the Yahoo Web search engine over three months, and selected for study users with high levels of interest in this topic. Searches were partitioned by expected survival for the disease being searched. We compared the search patterns of anonymous users and their contacts. Users seeking information on aggressive malignancies exhibited shorter search periods, focusing on disease- and treatment-related information. Users seeking knowledge regarding more indolent tumors searched for longer periods, alternated between different subjects, and demonstrated a high interest in topics such as support groups. Acquaintances searched for longer periods than the proband user when seeking information on aggressive (compared to indolent) cancers. Information needs can be modeled as transitioning between five discrete states, each with a unique signature representing the type of information of interest to the user. Thus, early phases of information-seeking for cancer follow a specific dynamic pattern. Areas of interest are disease dependent and vary between probands and their contacts. These patterns can be used by physicians and medical Web site authors to tailor information to the needs of patients and family members. PMID:23029317
JSC Search System Usability Case Study
NASA Technical Reports Server (NTRS)
Meza, David; Berndt, Sarah
2014-01-01
The advanced nature of "search" has facilitated the movement from keyword match to the delivery of every conceivable information topic from career, commerce, entertainment, learning... the list is infinite. At NASA Johnson Space Center (JSC ) the Search interface is an important means of knowledge transfer. By indexing multiple sources between directorates and organizations, the system's potential is culture changing in that through search, knowledge of the unique accomplishments in engineering and science can be seamlessly passed between generations. This paper reports the findings of an initial survey, the first of a four part study to help determine user sentiment on the intranet, or local (JSC) enterprise search environment as well as the larger NASA enterprise. The survey is a means through which end users provide direction on the development and transfer of knowledge by way of the search experience. The ideal is to identify what is working and what needs to be improved from the users' vantage point by documenting: (1) Where users are satisfied/dissatisfied (2) Perceived value of interface components (3) Gaps which cause any disappointment in search experience. The near term goal is it to inform JSC search in order to improve users' ability to utilize existing services and infrastructure to perform tasks with a shortened life cycle. Continuing steps include an agency based focus with modified questions to accomplish a similar purpose
Search without Boundaries Using Simple APIs
ERIC Educational Resources Information Center
Tong, Qi (Helen)
2009-01-01
The U.S. Geological Survey (USGS) Library, where the author serves as the digital services librarian, is increasingly challenged to make it easier for users to find information from many heterogeneous information sources. Information is scattered throughout different software applications (i.e., library catalog, federated search engine, link…
Engine Data Interpretation System (EDIS), phase 2
NASA Technical Reports Server (NTRS)
Cost, Thomas L.; Hofmann, Martin O.
1991-01-01
A prototype of an expert system was developed which applies qualitative constraint-based reasoning to the task of post-test analysis of data resulting from a rocket engine firing. Data anomalies are detected and corresponding faults are diagnosed. Engine behavior is reconstructed using measured data and knowledge about engine behavior. Knowledge about common faults guides but does not restrict the search for the best explanation in terms of hypothesized faults. The system contains domain knowledge about the behavior of common rocket engine components and was configured for use with the Space Shuttle Main Engine (SSME). A graphical user interface allows an expert user to intimately interact with the system during diagnosis. The system was applied to data taken during actual SSME tests where data anomalies were observed.
Sentiment of Search: KM and IT for User Expectations
NASA Technical Reports Server (NTRS)
Berndt, Sarah Ann; Meza, David
2014-01-01
User perceived value is the number one indicator of a successful implementation of KM and IT collaborations. The system known as "Search" requires more strategy and workflow that a mere data dump or ungoverned infrastructure can provide. Monitoring of user sentiment can be a driver for providing objective measures of success and justifying changes to the user interface. The dynamic nature of information technology makes traditional usability metrics difficult to identify, yet easy to argue against. There is little disagreement, however, on the criticality of adapting to user needs and expectations. The Systems Usability Scale (SUS), developed by John Brook in 1986 has become an industry standard for usability engineering. The first phase of a modified SUS, polls the sentiment of representative users of the JSC Search system. This information can be used to correlate user determined value with types of information sought and how the system is (or is not) meeting expectations. Sentiment analysis by way of the SUS assists an organization in identification and prioritization of the KM and IT variables impacting user perceived value. A secondary, user group focused analysis is the topic of additional work that demonstrates the impact of specific changes dictated by user sentiment.
Automated Data Tagging in the HLA
NASA Astrophysics Data System (ADS)
Gaffney, N. I.; Miller, W. W.
2008-08-01
One of the more powerful and popular forms of data organization implemented in most popular information sharing web applications is data tagging. With a rich user base from which to gather and digest tags, many interesting and often unanticipated yet very useful associations are revealed. With regard to an existing information, the astronomical community has a rich pool of existing digitally stored and searchable data than any of the currently popular web community, such as You Tube or My Space, had when they started. In initial experiments with the search engine for the Hubble Legacy Archive, we have created a simple yet powerful scheme by which the information from a footprint service, the NED and SIMBAD catalog services, and the ADS abstracts and keywords can be used to initially tag data with standard keywords. By then ingesting this into a public ally available information search engine, such as Apache Lucene, one can create a simple and powerful data tag search engine and association system. By then augmenting this with user provided keys and usage pattern analysis, one can produce a powerful modern data mining system for any astronomical data warehouse.
Mapping Self-Guided Learners' Searches for Video Tutorials on YouTube
ERIC Educational Resources Information Center
Garrett, Nathan
2016-01-01
While YouTube has a wealth of educational videos, how self-guided learners use these resources has not been fully described. An analysis of search engine queries for help with the use of Microsoft Excel shows that few users search for specific features or functions but instead use very general terms. Because the same videos are returned in…
Estimating search engine index size variability: a 9-year longitudinal study.
van den Bosch, Antal; Bogers, Toine; de Kunder, Maurice
One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine's index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing's indices over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much, if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.
Huesch, Marco D; Galstyan, Aram; Ong, Michael K; Doctor, Jason N
2016-06-01
To pilot public health interventions at women potentially interested in maternity care via campaigns on social media (Twitter), social networks (Facebook), and online search engines (Google Search). Primary data from Twitter, Facebook, and Google Search on users of these platforms in Los Angeles between March and July 2014. Observational study measuring the responses of targeted users of Twitter, Facebook, and Google Search exposed to our sponsored messages soliciting them to start an engagement process by clicking through to a study website containing information on maternity care quality information for the Los Angeles market. Campaigns reached a little more than 140,000 consumers each day across the three platforms, with a little more than 400 engagements each day. Facebook and Google search had broader reach, better engagement rates, and lower costs than Twitter. Costs to reach 1,000 targeted users were approximately in the same range as less well-targeted radio and TV advertisements, while initial engagements-a user clicking through an advertisement-cost less than $1 each. Our results suggest that commercially available online advertising platforms in wide use by other industries could play a role in targeted public health interventions. © Health Research and Educational Trust.
Where to search top-K biomedical ontologies?
Oliveira, Daniela; Butt, Anila Sahar; Haller, Armin; Rebholz-Schuhmann, Dietrich; Sahay, Ratnesh
2018-03-20
Searching for precise terms and terminological definitions in the biomedical data space is problematic, as researchers find overlapping, closely related and even equivalent concepts in a single or multiple ontologies. Search engines that retrieve ontological resources often suggest an extensive list of search results for a given input term, which leads to the tedious task of selecting the best-fit ontological resource (class or property) for the input term and reduces user confidence in the retrieval engines. A systematic evaluation of these search engines is necessary to understand their strengths and weaknesses in different search requirements. We have implemented seven comparable Information Retrieval ranking algorithms to search through ontologies and compared them against four search engines for ontologies. Free-text queries have been performed, the outcomes have been judged by experts and the ranking algorithms and search engines have been evaluated against the expert-based ground truth (GT). In addition, we propose a probabilistic GT that is developed automatically to provide deeper insights and confidence to the expert-based GT as well as evaluating a broader range of search queries. The main outcome of this work is the identification of key search factors for biomedical ontologies together with search requirements and a set of recommendations that will help biomedical experts and ontology engineers to select the best-suited retrieval mechanism in their search scenarios. We expect that this evaluation will allow researchers and practitioners to apply the current search techniques more reliably and that it will help them to select the right solution for their daily work. The source code (of seven ranking algorithms), ground truths and experimental results are available at https://github.com/danielapoliveira/bioont-search-benchmark.
ReSEARCH: A Requirements Search Engine: Progress Report 2
2008-09-01
and provides a convenient user interface for the search process. Ideally, the web application would be based on Tomcat, a free Java Servlet and JSP...Implementation issues Lucene Java is an Open Source project, available under the Apache License, which provides an accessible API for the development of...from the Apache Lucene website (Lucene- java Wiki , 2008). A search application developed with Lucene consists of the same two major com- ponents
Shilov, Ignat V; Seymour, Sean L; Patel, Alpesh A; Loboda, Alex; Tang, Wilfred H; Keating, Sean P; Hunter, Christie L; Nuwaysir, Lydia M; Schaeffer, Daniel A
2007-09-01
The Paragon Algorithm, a novel database search engine for the identification of peptides from tandem mass spectrometry data, is presented. Sequence Temperature Values are computed using a sequence tag algorithm, allowing the degree of implication by an MS/MS spectrum of each region of a database to be determined on a continuum. Counter to conventional approaches, features such as modifications, substitutions, and cleavage events are modeled with probabilities rather than by discrete user-controlled settings to consider or not consider a feature. The use of feature probabilities in conjunction with Sequence Temperature Values allows for a very large increase in the effective search space with only a very small increase in the actual number of hypotheses that must be scored. The algorithm has a new kind of user interface that removes the user expertise requirement, presenting control settings in the language of the laboratory that are translated to optimal algorithmic settings. To validate this new algorithm, a comparison with Mascot is presented for a series of analogous searches to explore the relative impact of increasing search space probed with Mascot by relaxing the tryptic digestion conformance requirements from trypsin to semitrypsin to no enzyme and with the Paragon Algorithm using its Rapid mode and Thorough mode with and without tryptic specificity. Although they performed similarly for small search space, dramatic differences were observed in large search space. With the Paragon Algorithm, hundreds of biological and artifact modifications, all possible substitutions, and all levels of conformance to the expected digestion pattern can be searched in a single search step, yet the typical cost in search time is only 2-5 times that of conventional small search space. Despite this large increase in effective search space, there is no drastic loss of discrimination that typically accompanies the exploration of large search space.
‘Sciencenet’—towards a global search and share engine for all scientific knowledge
Lütjohann, Dominic S.; Shah, Asmi H.; Christen, Michael P.; Richter, Florian; Knese, Karsten; Liebel, Urban
2011-01-01
Summary: Modern biological experiments create vast amounts of data which are geographically distributed. These datasets consist of petabytes of raw data and billions of documents. Yet to the best of our knowledge, a search engine technology that searches and cross-links all different data types in life sciences does not exist. We have developed a prototype distributed scientific search engine technology, ‘Sciencenet’, which facilitates rapid searching over this large data space. By ‘bringing the search engine to the data’, we do not require server farms. This platform also allows users to contribute to the search index and publish their large-scale data to support e-Science. Furthermore, a community-driven method guarantees that only scientific content is crawled and presented. Our peer-to-peer approach is sufficiently scalable for the science web without performance or capacity tradeoff. Availability and Implementation: The free to use search portal web page and the downloadable client are accessible at: http://sciencenet.kit.edu. The web portal for index administration is implemented in ASP.NET, the ‘AskMe’ experiment publisher is written in Python 2.7, and the backend ‘YaCy’ search engine is based on Java 1.6. Contact: urban.liebel@kit.edu Supplementary Material: Detailed instructions and descriptions can be found on the project homepage: http://sciencenet.kit.edu. PMID:21493657
FindZebra: a search engine for rare diseases.
Dragusin, Radu; Petcu, Paula; Lioma, Christina; Larsen, Birger; Jørgensen, Henrik L; Cox, Ingemar J; Hansen, Lars Kai; Ingwersen, Peter; Winther, Ole
2013-06-01
The web has become a primary information resource about illnesses and treatments for both medical and non-medical users. Standard web search is by far the most common interface to this information. It is therefore of interest to find out how well web search engines work for diagnostic queries and what factors contribute to successes and failures. Among diseases, rare (or orphan) diseases represent an especially challenging and thus interesting class to diagnose as each is rare, diverse in symptoms and usually has scattered resources associated with it. We design an evaluation approach for web search engines for rare disease diagnosis which includes 56 real life diagnostic cases, performance measures, information resources and guidelines for customising Google Search to this task. In addition, we introduce FindZebra, a specialized (vertical) rare disease search engine. FindZebra is powered by open source search technology and uses curated freely available online medical information. FindZebra outperforms Google Search in both default set-up and customised to the resources used by FindZebra. We extend FindZebra with specialized functionalities exploiting medical ontological information and UMLS medical concepts to demonstrate different ways of displaying the retrieved results to medical experts. Our results indicate that a specialized search engine can improve the diagnostic quality without compromising the ease of use of the currently widely popular standard web search. The proposed evaluation approach can be valuable for future development and benchmarking. The FindZebra search engine is available at http://www.findzebra.com/. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Collection of Medical Original Data with Search Engine for Decision Support.
Orthuber, Wolfgang
2016-01-01
Medicine is becoming more and more complex and humans can capture total medical knowledge only partially. For specific access a high resolution search engine is demonstrated, which allows besides conventional text search also search of precise quantitative data of medical findings, therapies and results. Users can define metric spaces ("Domain Spaces", DSs) with all searchable quantitative data ("Domain Vectors", DSs). An implementation of the search engine is online in http://numericsearch.com. In future medicine the doctor could make first a rough diagnosis and check which fine diagnostics (quantitative data) colleagues had collected in such a case. Then the doctor decides about fine diagnostics and results are sent (half automatically) to the search engine which filters a group of patients which best fits to these data. In this specific group variable therapies can be checked with associated therapeutic results, like in an individual scientific study for the current patient. The statistical (anonymous) results could be used for specific decision support. Reversely the therapeutic decision (in the best case with later results) could be used to enhance the collection of precise pseudonymous medical original data which is used for better and better statistical (anonymous) search results.
Cole, Curtis L; Kanter, Andrew S; Cummens, Michael; Vostinar, Sean; Naeymi-Rad, Frank
2004-01-01
To design and implement a real world application using a terminology server to assist patients and physicians who use common language search terms to find specialist physicians with a particular clinical expertise. Terminology servers have been developed to help users encoding of information using complicated structured vocabulary during data entry tasks, such as recording clinical information. We describe a methodology using Personal Health Terminology trade mark and a SNOMED CT-based hierarchical concept server. Construction of a pilot mediated-search engine to assist users who use vernacular speech in querying data which is more technical than vernacular. This approach, which combines theoretical and practical requirements, provides a useful example of concept-based searching for physician referrals.
Development of a Google-based search engine for data mining radiology reports.
Erinjeri, Joseph P; Picus, Daniel; Prior, Fred W; Rubin, David A; Koppel, Paul
2009-08-01
The aim of this study is to develop a secure, Google-based data-mining tool for radiology reports using free and open source technologies and to explore its use within an academic radiology department. A Health Insurance Portability and Accountability Act (HIPAA)-compliant data repository, search engine and user interface were created to facilitate treatment, operations, and reviews preparatory to research. The Institutional Review Board waived review of the project, and informed consent was not required. Comprising 7.9 GB of disk space, 2.9 million text reports were downloaded from our radiology information system to a fileserver. Extensible markup language (XML) representations of the reports were indexed using Google Desktop Enterprise search engine software. A hypertext markup language (HTML) form allowed users to submit queries to Google Desktop, and Google's XML response was interpreted by a practical extraction and report language (PERL) script, presenting ranked results in a web browser window. The query, reason for search, results, and documents visited were logged to maintain HIPAA compliance. Indexing averaged approximately 25,000 reports per hour. Keyword search of a common term like "pneumothorax" yielded the first ten most relevant results of 705,550 total results in 1.36 s. Keyword search of a rare term like "hemangioendothelioma" yielded the first ten most relevant results of 167 total results in 0.23 s; retrieval of all 167 results took 0.26 s. Data mining tools for radiology reports will improve the productivity of academic radiologists in clinical, educational, research, and administrative tasks. By leveraging existing knowledge of Google's interface, radiologists can quickly perform useful searches.
Johnson, Amy K; Mikati, Tarek; Mehta, Supriya D
2016-11-09
US surveillance of sexually transmitted diseases (STDs) is often delayed and incomplete which creates missed opportunities to identify and respond to trends in disease. Internet search engine data has the potential to be an efficient, economical and representative enhancement to the established surveillance system. Google Trends allows the download of de-identified search engine data, which has been used to demonstrate the positive and statistically significant association between STD-related search terms and STD rates. In this study, search engine user content was identified by surveying specific exposure groups of individuals (STD clinic patients and university students) aged 18-35. Participants were asked to list the terms they use to search for STD-related information. Google Correlate was used to validate search term content. On average STD clinic participant queries were longer compared to student queries. STD clinic participants were more likely to report using search terms that were related to symptomatology such as describing symptoms of STDs, while students were more likely to report searching for general information. These differences in search terms by subpopulation have implications for STD surveillance in populations at most risk for disease acquisition.
Improving PHENIX search with Solr, Nutch and Drupal.
NASA Astrophysics Data System (ADS)
Morrison, Dave; Sourikova, Irina
2012-12-01
During its 20 years of R&D, construction and operation the PHENIX experiment at the Relativistic Heavy Ion Collider (RHIC) has accumulated large amounts of proprietary collaboration data that is hosted on many servers around the world and is not open for commercial search engines for indexing and searching. The legacy search infrastructure did not scale well with the fast growing PHENIX document base and produced results inadequate in both precision and recall. After considering the possible alternatives that would provide an aggregated, fast, full text search of a variety of data sources and file formats we decided to use Nutch [1] as a web crawler and Solr [2] as a search engine. To present XML-based Solr search results in a user-friendly format we use Drupal [3] as a web interface to Solr. We describe the experience of building a federated search for a heterogeneous collection of 10 million PHENIX documents with Nutch, Solr and Drupal.
ERIC Educational Resources Information Center
Raitt, David I., Ed.; Jeapes, Ben, Ed.
This proceedings volume contains 68 papers. Subjects addressed include: access to information; the future of information managers/librarians; intelligent agents; changing roles of library users; disintermediation; Internet review sites; World Wide Web (WWW) search engines; Java; online searching; future of online education; integrated information…
Differences and Similarities in Information Seeking: Children and Adults as Web Users.
ERIC Educational Resources Information Center
Bilal, Dania; Kirby, Joe
2002-01-01
Analyzed and compared the success and information seeking behaviors of seventh grade science students and graduate students in using the Yahooligans! Web search engine. Discusses cognitive, affective, and physical behaviors during a fact-finding task, including searching, browsing, and time to complete the task; navigational styles; and focus on…
2009-10-02
October. Jansen, B. J., Zhang, M., and Zhang, Y. (2007) Brand Awareness and the Evaluation of Search Results, 16th International World Wide Web...2007) The Effect of Brand Awareness on the Evaluation of Search Engine Results, Conference on Human Factors in Computing Systems (SIGCHI), Work-in
Sagace: A web-based search engine for biomedical databases in Japan
2012-01-01
Background In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database. Findings We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data) and biological resource banks (such as mouse models of disease and cell lines). With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry. Conclusions Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at http://sagace.nibio.go.jp/en/. PMID:23110816
Is Internet search better than structured instruction for web-based health education?
Finkelstein, Joseph; Bedra, McKenzie
2013-01-01
Internet provides access to vast amounts of comprehensive information regarding any health-related subject. Patients increasingly use this information for health education using a search engine to identify education materials. An alternative approach of health education via Internet is based on utilizing a verified web site which provides structured interactive education guided by adult learning theories. Comparison of these two approaches in older patients was not performed systematically. The aim of this study was to compare the efficacy of a web-based computer-assisted education (CO-ED) system versus searching the Internet for learning about hypertension. Sixty hypertensive older adults (age 45+) were randomized into control or intervention groups. The control patients spent 30 to 40 minutes searching the Internet using a search engine for information about hypertension. The intervention patients spent 30 to 40 minutes using the CO-ED system, which provided computer-assisted instruction about major hypertension topics. Analysis of pre- and post- knowledge scores indicated a significant improvement among CO-ED users (14.6%) as opposed to Internet users (2%). Additionally, patients using the CO-ED program rated their learning experience more positively than those using the Internet.
Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.
De-Arteaga, Maria; Eggel, Ivan; Do, Bao; Rubin, Daniel; Kahn, Charles E; Müller, Henning
2015-08-01
Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the way in which physicians aim to access information. Medical image search is a much smaller domain but has gained much attention as it has different characteristics than search for text documents. While web search log files have been analysed many times to better understand user behaviour, the log files of hospital internal systems for search in a PACS/RIS (Picture Archival and Communication System, Radiology Information System) have rarely been analysed. Such a comparison between a hospital PACS/RIS search and a web system for searching images of the biomedical literature is the goal of this paper. Objectives are to identify similarities and differences in search behaviour of the two systems, which could then be used to optimize existing systems and build new search engines. Log files of the ARRS GoldMiner medical image search engine (freely accessible on the Internet) containing 222,005 queries, and log files of Stanford's internal PACS/RIS search called radTF containing 18,068 queries were analysed. Each query was preprocessed and all query terms were mapped to the RadLex (Radiology Lexicon) terminology, a comprehensive lexicon of radiology terms created and maintained by the Radiological Society of North America, so the semantic content in the queries and the links between terms could be analysed, and synonyms for the same concept could be detected. RadLex was mainly created for the use in radiology reports, to aid structured reporting and the preparation of educational material (Lanlotz, 2006) [1]. In standard medical vocabularies such as MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) specific terms of radiology are often underrepresented, therefore RadLex was considered to be the best option for this task. The results show a surprising similarity between the usage behaviour in the two systems, but several subtle differences can also be noted. The average number of terms per query is 2.21 for GoldMiner and 2.07 for radTF, the used axes of RadLex (anatomy, pathology, findings, …) have almost the same distribution with clinical findings being the most frequent and the anatomical entity the second; also, combinations of RadLex axes are extremely similar between the two systems. Differences include a longer length of the sessions in radTF than in GoldMiner (3.4 and 1.9 queries per session on average). Several frequent search terms overlap but some strong differences exist in the details. In radTF the term "normal" is frequent, whereas in GoldMiner it is not. This makes intuitive sense, as in the literature normal cases are rarely described whereas in clinical work the comparison with normal cases is often a first step. The general similarity in many points is likely due to the fact that users of the two systems are influenced by their daily behaviour in using standard web search engines and follow this behaviour in their professional search. This means that many results and insights gained from standard web search can likely be transferred to more specialized search systems. Still, specialized log files can be used to find out more on reformulations and detailed strategies of users to find the right content. Copyright © 2015 Elsevier Inc. All rights reserved.
Mercury: Reusable software application for Metadata Management, Data Discovery and Access
NASA Astrophysics Data System (ADS)
Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce E.
2009-12-01
Mercury is a federated metadata harvesting, data discovery and access tool based on both open source packages and custom developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury is itself a reusable toolset for metadata, with current use in 12 different projects. Mercury also supports the reuse of metadata by enabling searching across a range of metadata specification and standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115. Mercury provides a single portal to information contained in distributed data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. One of the major goals of the recent redesign of Mercury was to improve the software reusability across the projects which currently fund the continuing development of Mercury. These projects span a range of land, atmosphere, and ocean ecological communities and have a number of common needs for metadata searches, but they also have a number of needs specific to one or a few projects To balance these common and project-specific needs, Mercury’s architecture includes three major reusable components; a harvester engine, an indexing system and a user interface component. The harvester engine is responsible for harvesting metadata records from various distributed servers around the USA and around the world. The harvester software was packaged in such a way that all the Mercury projects will use the same harvester scripts but each project will be driven by a set of configuration files. The harvested files are then passed to the Indexing system, where each of the fields in these structured metadata records are indexed properly, so that the query engine can perform simple, keyword, spatial and temporal searches across these metadata sources. The search user interface software has two API categories; a common core API which is used by all the Mercury user interfaces for querying the index and a customized API for project specific user interfaces. For our work in producing a reusable, portable, robust, feature-rich application, Mercury received a 2008 NASA Earth Science Data Systems Software Reuse Working Group Peer-Recognition Software Reuse Award. The new Mercury system is based on a Service Oriented Architecture and effectively reuses components for various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. The software also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets, integrated shopping cart to order datasets from various data centers (ORNL DAAC, NSIDC) and integrated visualization tools. Other features include: Filtering and dynamic sorting of search results, book-markable search results, save, retrieve, and modify search criteria.
A study of medical and health queries to web search engines.
Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk
2004-03-01
This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.
BOSS: context-enhanced search for biomedical objects
2012-01-01
Background There exist many academic search solutions and most of them can be put on either ends of spectrum: general-purpose search and domain-specific "deep" search systems. The general-purpose search systems, such as PubMed, offer flexible query interface, but churn out a list of matching documents that users have to go through the results in order to find the answers to their queries. On the other hand, the "deep" search systems, such as PPI Finder and iHOP, return the precompiled results in a structured way. Their results, however, are often found only within some predefined contexts. In order to alleviate these problems, we introduce a new search engine, BOSS, Biomedical Object Search System. Methods Unlike the conventional search systems, BOSS indexes segments, rather than documents. A segment refers to a Maximal Coherent Semantic Unit (MCSU) such as phrase, clause or sentence that is semantically coherent in the given context (e.g., biomedical objects or their relations). For a user query, BOSS finds all matching segments, identifies the objects appearing in those segments, and aggregates the segments for each object. Finally, it returns the ranked list of the objects along with their matching segments. Results The working prototype of BOSS is available at http://boss.korea.ac.kr. The current version of BOSS has indexed abstracts of more than 20 million articles published during last 16 years from 1996 to 2011 across all science disciplines. Conclusion BOSS fills the gap between either ends of the spectrum by allowing users to pose context-free queries and by returning a structured set of results. Furthermore, BOSS exhibits the characteristic of good scalability, just as with conventional document search engines, because it is designed to use a standard document-indexing model with minimal modifications. Considering the features, BOSS notches up the technological level of traditional solutions for search on biomedical information. PMID:22595092
Through the Google Goggles: Sociopolitical Bias in Search Engine Design
NASA Astrophysics Data System (ADS)
Diaz, A.
Search engines like Google are essential to navigating the Web's endless supply of news, political information, and citizen discourse. The mechanisms and conditions under which search results are selected should therefore be of considerable interest to media scholars, political theorists, and citizens alike. In this chapter, I adopt a "deliberative" ideal for search engines and examine whether Google exhibits the "same old" media biases of mainstreaming, hypercommercialism, and industry consolidation. In the end, serious objections to Google are raised: Google may favor popularity over richness; it provides advertising that competes directly with "editorial" content; it so overwhelmingly dominates the industry that users seldom get a second opinion, and this is unlikely to change. Ultimately, however, the results of this analysis may speak less about Google than about contradictions in the deliberative ideal and the so-called "inherently democratic" nature of the Web.
Borras-Morell, Jose-Enrique; Martinez-Millana, Antonio; Karlsen, Randi
2017-01-01
Health consumers are increasingly using the Internet to search for health information. The existence of overloaded, inaccurate, obsolete, or simply incorrect health information available on the Internet is a serious obstacle for finding relevant and good-quality data that actually helps patients. Search engines of multimedia Internet platforms are thought to help users to find relevant information according to their search. But, is the information recovered by those search engines from quality sources? Is the health information uploaded from reliable sources, such as hospitals and health organizations, easily available to patients? The availability of videos is directly related to the ranking position in YouTube search. The higher the ranking of the information is, the more accessible it is. The aim of this study is to analyze the ranking evolution of diabetes health videos on YouTube in order to discover how videos from reliable channels, such as hospitals and health organizations, are evolving in the ranking. The analysis was done by tracking the ranking of 2372 videos on a daily basis during a 30-day period using 20 diabetes-related queries. Our conclusions are that the current YouTube algorithm favors the presence of reliable videos in upper rank positions in diabetes-related searches. PMID:28243314
Fernandez-Llatas, Carlos; Traver, Vicente; Borras-Morell, Jose-Enrique; Martinez-Millana, Antonio; Karlsen, Randi
2017-01-01
Health consumers are increasingly using the Internet to search for health information. The existence of overloaded, inaccurate, obsolete, or simply incorrect health information available on the Internet is a serious obstacle for finding relevant and good-quality data that actually helps patients. Search engines of multimedia Internet platforms are thought to help users to find relevant information according to their search. But, is the information recovered by those search engines from quality sources? Is the health information uploaded from reliable sources, such as hospitals and health organizations, easily available to patients? The availability of videos is directly related to the ranking position in YouTube search. The higher the ranking of the information is, the more accessible it is. The aim of this study is to analyze the ranking evolution of diabetes health videos on YouTube in order to discover how videos from reliable channels, such as hospitals and health organizations, are evolving in the ranking. The analysis was done by tracking the ranking of 2372 videos on a daily basis during a 30-day period using 20 diabetes-related queries. Our conclusions are that the current YouTube algorithm favors the presence of reliable videos in upper rank positions in diabetes-related searches.
Search strategies on the Internet: general and specific.
Bottrill, Krys
2004-06-01
Some of the most up-to-date information on scientific activity is to be found on the Internet; for example, on the websites of academic and other research institutions and in databases of currently funded research studies provided on the websites of funding bodies. Such information can be valuable in suggesting new approaches and techniques that could be applicable in a Three Rs context. However, the Internet is a chaotic medium, not subject to the meticulous classification and organisation of classical information resources. At the same time, Internet search engines do not match the sophistication of search systems used by database hosts. Also, although some offer relatively advanced features, user awareness of these tends to be low. Furthermore, much of the information on the Internet is not accessible to conventional search engines, giving rise to the concept of the "Invisible Web". General strategies and techniques for Internet searching are presented, together with a comparative survey of selected search engines. The question of how the Invisible Web can be accessed is discussed, as well as how to keep up-to-date with Internet content and improve searching skills.
FPS-RAM: Fast Prefix Search RAM-Based Hardware for Forwarding Engine
NASA Astrophysics Data System (ADS)
Zaitsu, Kazuya; Yamamoto, Koji; Kuroda, Yasuto; Inoue, Kazunari; Ata, Shingo; Oka, Ikuo
Ternary content addressable memory (TCAM) is becoming very popular for designing high-throughput forwarding engines on routers. However, TCAM has potential problems in terms of hardware and power costs, which limits its ability to deploy large amounts of capacity in IP routers. In this paper, we propose new hardware architecture for fast forwarding engines, called fast prefix search RAM-based hardware (FPS-RAM). We designed FPS-RAM hardware with the intent of maintaining the same search performance and physical user interface as TCAM because our objective is to replace the TCAM in the market. Our RAM-based hardware architecture is completely different from that of TCAM and has dramatically reduced the costs and power consumption to 62% and 52%, respectively. We implemented FPS-RAM on an FPGA to examine its lookup operation.
An architecture for diversity-aware search for medical web content.
Denecke, K
2012-01-01
The Web provides a huge source of information, also on medical and health-related issues. In particular the content of medical social media data can be diverse due to the background of an author, the source or the topic. Diversity in this context means that a document covers different aspects of a topic or a topic is described in different ways. In this paper, we introduce an approach that allows to consider the diverse aspects of a search query when providing retrieval results to a user. We introduce a system architecture for a diversity-aware search engine that allows retrieving medical information from the web. The diversity of retrieval results is assessed by calculating diversity measures that rely upon semantic information derived from a mapping to concepts of a medical terminology. Considering these measures, the result set is diversified by ranking more diverse texts higher. The methods and system architecture are implemented in a retrieval engine for medical web content. The diversity measures reflect the diversity of aspects considered in a text and its type of information content. They are used for result presentation, filtering and ranking. In a user evaluation we assess the user satisfaction with an ordering of retrieval results that considers the diversity measures. It is shown through the evaluation that diversity-aware retrieval considering diversity measures in ranking could increase the user satisfaction with retrieval results.
Jácome, Alberto G; Fdez-Riverola, Florentino; Lourenço, Anália
2016-07-01
Text mining and semantic analysis approaches can be applied to the construction of biomedical domain-specific search engines and provide an attractive alternative to create personalized and enhanced search experiences. Therefore, this work introduces the new open-source BIOMedical Search Engine Framework for the fast and lightweight development of domain-specific search engines. The rationale behind this framework is to incorporate core features typically available in search engine frameworks with flexible and extensible technologies to retrieve biomedical documents, annotate meaningful domain concepts, and develop highly customized Web search interfaces. The BIOMedical Search Engine Framework integrates taggers for major biomedical concepts, such as diseases, drugs, genes, proteins, compounds and organisms, and enables the use of domain-specific controlled vocabulary. Technologies from the Typesafe Reactive Platform, the AngularJS JavaScript framework and the Bootstrap HTML/CSS framework support the customization of the domain-oriented search application. Moreover, the RESTful API of the BIOMedical Search Engine Framework allows the integration of the search engine into existing systems or a complete web interface personalization. The construction of the Smart Drug Search is described as proof-of-concept of the BIOMedical Search Engine Framework. This public search engine catalogs scientific literature about antimicrobial resistance, microbial virulence and topics alike. The keyword-based queries of the users are transformed into concepts and search results are presented and ranked accordingly. The semantic graph view portraits all the concepts found in the results, and the researcher may look into the relevance of different concepts, the strength of direct relations, and non-trivial, indirect relations. The number of occurrences of the concept shows its importance to the query, and the frequency of concept co-occurrence is indicative of biological relations meaningful to that particular scope of research. Conversely, indirect concept associations, i.e. concepts related by other intermediary concepts, can be useful to integrate information from different studies and look into non-trivial relations. The BIOMedical Search Engine Framework supports the development of domain-specific search engines. The key strengths of the framework are modularity and extensibilityin terms of software design, the use of open-source consolidated Web technologies, and the ability to integrate any number of biomedical text mining tools and information resources. Currently, the Smart Drug Search keeps over 1,186,000 documents, containing more than 11,854,000 annotations for 77,200 different concepts. The Smart Drug Search is publicly accessible at http://sing.ei.uvigo.es/sds/. The BIOMedical Search Engine Framework is freely available for non-commercial use at https://github.com/agjacome/biomsef. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Measurement of tag confidence in user generated contents retrieval
NASA Astrophysics Data System (ADS)
Lee, Sihyoung; Min, Hyun-Seok; Lee, Young Bok; Ro, Yong Man
2009-01-01
As online image sharing services are becoming popular, the importance of correctly annotated tags is being emphasized for precise search and retrieval. Tags created by user along with user-generated contents (UGC) are often ambiguous due to the fact that some tags are highly subjective and visually unrelated to the image. They cause unwanted results to users when image search engines rely on tags. In this paper, we propose a method of measuring tag confidence so that one can differentiate confidence tags from noisy tags. The proposed tag confidence is measured from visual semantics of the image. To verify the usefulness of the proposed method, experiments were performed with UGC database from social network sites. Experimental results showed that the image retrieval performance with confidence tags was increased.
Technical development of PubMed interact: an improved interface for MEDLINE/PubMed searches.
Muin, Michael; Fontelo, Paul
2006-11-03
The project aims to create an alternative search interface for MEDLINE/PubMed that may provide assistance to the novice user and added convenience to the advanced user. An earlier version of the project was the 'Slider Interface for MEDLINE/PubMed searches' (SLIM) which provided JavaScript slider bars to control search parameters. In this new version, recent developments in Web-based technologies were implemented. These changes may prove to be even more valuable in enhancing user interactivity through client-side manipulation and management of results. PubMed Interact is a Web-based MEDLINE/PubMed search application built with HTML, JavaScript and PHP. It is implemented on a Windows Server 2003 with Apache 2.0.52, PHP 4.4.1 and MySQL 4.1.18. PHP scripts provide the backend engine that connects with E-Utilities and parses XML files. JavaScript manages client-side functionalities and converts Web pages into interactive platforms using dynamic HTML (DHTML), Document Object Model (DOM) tree manipulation and Ajax methods. With PubMed Interact, users can limit searches with JavaScript slider bars, preview result counts, delete citations from the list, display and add related articles and create relevance lists. Many interactive features occur at client-side, which allow instant feedback without reloading or refreshing the page resulting in a more efficient user experience. PubMed Interact is a highly interactive Web-based search application for MEDLINE/PubMed that explores recent trends in Web technologies like DOM tree manipulation and Ajax. It may become a valuable technical development for online medical search applications.
ERIC Educational Resources Information Center
Nemeth, Erik
2010-01-01
Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…
DRUMS: a human disease related unique gene mutation search engine.
Li, Zuofeng; Liu, Xingnan; Wen, Jingran; Xu, Ye; Zhao, Xin; Li, Xuan; Liu, Lei; Zhang, Xiaoyan
2011-10-01
With the completion of the human genome project and the development of new methods for gene variant detection, the integration of mutation data and its phenotypic consequences has become more important than ever. Among all available resources, locus-specific databases (LSDBs) curate one or more specific genes' mutation data along with high-quality phenotypes. Although some genotype-phenotype data from LSDB have been integrated into central databases little effort has been made to integrate all these data by a search engine approach. In this work, we have developed disease related unique gene mutation search engine (DRUMS), a search engine for human disease related unique gene mutation as a convenient tool for biologists or physicians to retrieve gene variant and related phenotype information. Gene variant and phenotype information were stored in a gene-centred relational database. Moreover, the relationships between mutations and diseases were indexed by the uniform resource identifier from LSDB, or another central database. By querying DRUMS, users can access the most popular mutation databases under one interface. DRUMS could be treated as a domain specific search engine. By using web crawling, indexing, and searching technologies, it provides a competitively efficient interface for searching and retrieving mutation data and their relationships to diseases. The present system is freely accessible at http://www.scbit.org/glif/new/drums/index.html. © 2011 Wiley-Liss, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
- PNNL, Harold Trease
2012-10-10
ASSA is a software application that processes binary data into summarized index tables that can be used to organize features contained within the data. ASSA's index tables can also be used to search for user specified features. ASSA is designed to organize and search for patterns in unstructured binary data streams or archives, such as video, images, audio, and network traffic. ASSA is basically a very general search engine used to search for any pattern in any binary data stream. It has uses in video analytics, image analysis, audio analysis, searching hard-drives, monitoring network traffic, etc.
Engineering With Nature Geographic Project Mapping Tool (EWN ProMap)
2015-07-01
EWN ProMap database provides numerous case studies for infrastructure projects such as breakwaters, river engineering dikes, and seawalls that have...the EWN Project Mapping Tool (EWN ProMap) is to assist users in their search for case study information that can be valuable for developing EWN ideas...Essential elements of EWN include: (1) using science and engineering to produce operational efficiencies supporting sustainable delivery of
Electronic Biomedical Literature Search for Budding Researcher
Thakre, Subhash B.; Thakre S, Sushama S.; Thakre, Amol D.
2013-01-01
Search for specific and well defined literature related to subject of interest is the foremost step in research. When we are familiar with topic or subject then we can frame appropriate research question. Appropriate research question is the basis for study objectives and hypothesis. The Internet provides a quick access to an overabundance of the medical literature, in the form of primary, secondary and tertiary literature. It is accessible through journals, databases, dictionaries, textbooks, indexes, and e-journals, thereby allowing access to more varied, individualised, and systematic educational opportunities. Web search engine is a tool designed to search for information on the World Wide Web, which may be in the form of web pages, images, information, and other types of files. Search engines for internet-based search of medical literature include Google, Google scholar, Scirus, Yahoo search engine, etc., and databases include MEDLINE, PubMed, MEDLARS, etc. Several web-libraries (National library Medicine, Cochrane, Web of Science, Medical matrix, Emory libraries) have been developed as meta-sites, providing useful links to health resources globally. A researcher must keep in mind the strengths and limitations of a particular search engine/database while searching for a particular type of data. Knowledge about types of literature, levels of evidence, and detail about features of search engine as available, user interface, ease of access, reputable content, and period of time covered allow their optimal use and maximal utility in the field of medicine. Literature search is a dynamic and interactive process; there is no one way to conduct a search and there are many variables involved. It is suggested that a systematic search of literature that uses available electronic resource effectively, is more likely to produce quality research. PMID:24179937
Electronic biomedical literature search for budding researcher.
Thakre, Subhash B; Thakre S, Sushama S; Thakre, Amol D
2013-09-01
Search for specific and well defined literature related to subject of interest is the foremost step in research. When we are familiar with topic or subject then we can frame appropriate research question. Appropriate research question is the basis for study objectives and hypothesis. The Internet provides a quick access to an overabundance of the medical literature, in the form of primary, secondary and tertiary literature. It is accessible through journals, databases, dictionaries, textbooks, indexes, and e-journals, thereby allowing access to more varied, individualised, and systematic educational opportunities. Web search engine is a tool designed to search for information on the World Wide Web, which may be in the form of web pages, images, information, and other types of files. Search engines for internet-based search of medical literature include Google, Google scholar, Scirus, Yahoo search engine, etc., and databases include MEDLINE, PubMed, MEDLARS, etc. Several web-libraries (National library Medicine, Cochrane, Web of Science, Medical matrix, Emory libraries) have been developed as meta-sites, providing useful links to health resources globally. A researcher must keep in mind the strengths and limitations of a particular search engine/database while searching for a particular type of data. Knowledge about types of literature, levels of evidence, and detail about features of search engine as available, user interface, ease of access, reputable content, and period of time covered allow their optimal use and maximal utility in the field of medicine. Literature search is a dynamic and interactive process; there is no one way to conduct a search and there are many variables involved. It is suggested that a systematic search of literature that uses available electronic resource effectively, is more likely to produce quality research.
Development and tuning of an original search engine for patent libraries in medicinal chemistry.
Pasche, Emilie; Gobeill, Julien; Kreim, Olivier; Oezdemir-Zaech, Fatma; Vachon, Therese; Lovis, Christian; Ruch, Patrick
2014-01-01
The large increase in the size of patent collections has led to the need of efficient search strategies. But the development of advanced text-mining applications dedicated to patents of the biomedical field remains rare, in particular to address the needs of the pharmaceutical & biotech industry, which intensively uses patent libraries for competitive intelligence and drug development. We describe here the development of an advanced retrieval engine to search information in patent collections in the field of medicinal chemistry. We investigate and combine different strategies and evaluate their respective impact on the performance of the search engine applied to various search tasks, which covers the putatively most frequent search behaviours of intellectual property officers in medical chemistry: 1) a prior art search task; 2) a technical survey task; and 3) a variant of the technical survey task, sometimes called known-item search task, where a single patent is targeted. The optimal tuning of our engine resulted in a top-precision of 6.76% for the prior art search task, 23.28% for the technical survey task and 46.02% for the variant of the technical survey task. We observed that co-citation boosting was an appropriate strategy to improve prior art search tasks, while IPC classification of queries was improving retrieval effectiveness for technical survey tasks. Surprisingly, the use of the full body of the patent was always detrimental for search effectiveness. It was also observed that normalizing biomedical entities using curated dictionaries had simply no impact on the search tasks we evaluate. The search engine was finally implemented as a web-application within Novartis Pharma. The application is briefly described in the report. We have presented the development of a search engine dedicated to patent search, based on state of the art methods applied to patent corpora. We have shown that a proper tuning of the system to adapt to the various search tasks clearly increases the effectiveness of the system. We conclude that different search tasks demand different information retrieval engines' settings in order to yield optimal end-user retrieval.
Development and tuning of an original search engine for patent libraries in medicinal chemistry
2014-01-01
Background The large increase in the size of patent collections has led to the need of efficient search strategies. But the development of advanced text-mining applications dedicated to patents of the biomedical field remains rare, in particular to address the needs of the pharmaceutical & biotech industry, which intensively uses patent libraries for competitive intelligence and drug development. Methods We describe here the development of an advanced retrieval engine to search information in patent collections in the field of medicinal chemistry. We investigate and combine different strategies and evaluate their respective impact on the performance of the search engine applied to various search tasks, which covers the putatively most frequent search behaviours of intellectual property officers in medical chemistry: 1) a prior art search task; 2) a technical survey task; and 3) a variant of the technical survey task, sometimes called known-item search task, where a single patent is targeted. Results The optimal tuning of our engine resulted in a top-precision of 6.76% for the prior art search task, 23.28% for the technical survey task and 46.02% for the variant of the technical survey task. We observed that co-citation boosting was an appropriate strategy to improve prior art search tasks, while IPC classification of queries was improving retrieval effectiveness for technical survey tasks. Surprisingly, the use of the full body of the patent was always detrimental for search effectiveness. It was also observed that normalizing biomedical entities using curated dictionaries had simply no impact on the search tasks we evaluate. The search engine was finally implemented as a web-application within Novartis Pharma. The application is briefly described in the report. Conclusions We have presented the development of a search engine dedicated to patent search, based on state of the art methods applied to patent corpora. We have shown that a proper tuning of the system to adapt to the various search tasks clearly increases the effectiveness of the system. We conclude that different search tasks demand different information retrieval engines' settings in order to yield optimal end-user retrieval. PMID:24564220
Seeking health information on the web: positive hypothesis testing.
Kayhan, Varol Onur
2013-04-01
The goal of this study is to investigate positive hypothesis testing among consumers of health information when they search the Web. After demonstrating the extent of positive hypothesis testing using Experiment 1, we conduct Experiment 2 to test the effectiveness of two debiasing techniques. A total of 60 undergraduate students searched a tightly controlled online database developed by the authors to test the validity of a hypothesis. The database had four abstracts that confirmed the hypothesis and three abstracts that disconfirmed it. Findings of Experiment 1 showed that majority of participants (85%) exhibited positive hypothesis testing. In Experiment 2, we found that the recommendation technique was not effective in reducing positive hypothesis testing since none of the participants assigned to this server could retrieve disconfirming evidence. Experiment 2 also showed that the incorporation technique successfully reduced positive hypothesis testing since 75% of the participants could retrieve disconfirming evidence. Positive hypothesis testing on the Web is an understudied topic. More studies are needed to validate the effectiveness of the debiasing techniques discussed in this study and develop new techniques. Search engine developers should consider developing new options for users so that both confirming and disconfirming evidence can be presented in search results as users test hypotheses using search engines. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
System for Performing Single Query Searches of Heterogeneous and Dispersed Databases
NASA Technical Reports Server (NTRS)
Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)
2017-01-01
The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.
A collection of open source applications for mass spectrometry data mining.
Gallardo, Óscar; Ovelleiro, David; Gay, Marina; Carrascal, Montserrat; Abian, Joaquin
2014-10-01
We present several bioinformatics applications for the identification and quantification of phosphoproteome components by MS. These applications include a front-end graphical user interface that combines several Thermo RAW formats to MASCOT™ Generic Format extractors (EasierMgf), two graphical user interfaces for search engines OMSSA and SEQUEST (OmssaGui and SequestGui), and three applications, one for the management of databases in FASTA format (FastaTools), another for the integration of search results from up to three search engines (Integrator), and another one for the visualization of mass spectra and their corresponding database search results (JsonVisor). These applications were developed to solve some of the common problems found in proteomic and phosphoproteomic data analysis and were integrated in the workflow for data processing and feeding on our LymPHOS database. Applications were designed modularly and can be used standalone. These tools are written in Perl and Python programming languages and are supported on Windows platforms. They are all released under an Open Source Software license and can be freely downloaded from our software repository hosted at GoogleCode. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Biron, P; Metzger, M H; Pezet, C; Sebban, C; Barthuet, E; Durand, T
2014-01-01
A full-text search tool was introduced into the daily practice of Léon Bérard Center (France), a health care facility devoted to treatment of cancer. This tool was integrated into the hospital information system by the IT department having been granted full autonomy to improve the system. To describe the development and various uses of a tool for full-text search of computerized patient records. The technology is based on Solr, an open-source search engine. It is a web-based application that processes HTTP requests and returns HTTP responses. A data processing pipeline that retrieves data from different repositories, normalizes, cleans and publishes it to Solr, was integrated in the information system of the Leon Bérard center. The IT department developed also user interfaces to allow users to access the search engine within the computerized medical record of the patient. From January to May 2013, 500 queries were launched per month by an average of 140 different users. Several usages of the tool were described, as follows: medical management of patients, medical research, and improving the traceability of medical care in medical records. The sensitivity of the tool for detecting the medical records of patients diagnosed with both breast cancer and diabetes was 83.0%, and its positive predictive value was 48.7% (gold standard: manual screening by a clinical research assistant). The project demonstrates that the introduction of full-text-search tools allowed practitioners to use unstructured medical information for various purposes.
Technical development of PubMed Interact: an improved interface for MEDLINE/PubMed searches
Muin, Michael; Fontelo, Paul
2006-01-01
Background The project aims to create an alternative search interface for MEDLINE/PubMed that may provide assistance to the novice user and added convenience to the advanced user. An earlier version of the project was the 'Slider Interface for MEDLINE/PubMed searches' (SLIM) which provided JavaScript slider bars to control search parameters. In this new version, recent developments in Web-based technologies were implemented. These changes may prove to be even more valuable in enhancing user interactivity through client-side manipulation and management of results. Results PubMed Interact is a Web-based MEDLINE/PubMed search application built with HTML, JavaScript and PHP. It is implemented on a Windows Server 2003 with Apache 2.0.52, PHP 4.4.1 and MySQL 4.1.18. PHP scripts provide the backend engine that connects with E-Utilities and parses XML files. JavaScript manages client-side functionalities and converts Web pages into interactive platforms using dynamic HTML (DHTML), Document Object Model (DOM) tree manipulation and Ajax methods. With PubMed Interact, users can limit searches with JavaScript slider bars, preview result counts, delete citations from the list, display and add related articles and create relevance lists. Many interactive features occur at client-side, which allow instant feedback without reloading or refreshing the page resulting in a more efficient user experience. Conclusion PubMed Interact is a highly interactive Web-based search application for MEDLINE/PubMed that explores recent trends in Web technologies like DOM tree manipulation and Ajax. It may become a valuable technical development for online medical search applications. PMID:17083729
Jadhav, Ashutosh; Sheth, Amit; Pathak, Jyotishman
2014-01-01
Since the early 2000’s, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users “information need” and how do they formulate search queries (“expression of information need”). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD related search queries from MayoClinic.com. Using UMLS MetaMap and UMLS semantic types/concepts, we developed a rule-based approach to categorize the queries into 14 health categories. We analyzed structural properties, types (keyword-based/Wh-questions/Yes-No questions) and linguistic structure of the queries. Our results show that the most searched health categories are ‘Diseases/Conditions’, ‘Vital-Sings’, ‘Symptoms’ and ‘Living-with’. CVD queries are longer and are predominantly keyword-based. This study extends our knowledge about online health information searching and provides useful insights for Web search engines and health websites. PMID:25954380
Google Search Queries About Neurosurgical Topics: Are They a Suitable Guide for Neurosurgeons?
Lawson McLean, Anna C; Lawson McLean, Aaron; Kalff, Rolf; Walter, Jan
2016-06-01
Google is the most popular search engine, with about 100 billion searches per month. Google Trends is an integrated tool that allows users to obtain Google's search popularity statistics from the last decade. Our aim was to evaluate whether Google Trends is a useful tool to assess the public's interest in specific neurosurgical topics. We evaluated Google Trends statistics for the neurosurgical search topic areas "hydrocephalus," "spinal stenosis," "concussion," "vestibular schwannoma," and "cerebral arteriovenous malformation." We compared these with bibliometric data from PubMed and epidemiologic data from the German Federal Monitoring Agency. In addition, we assessed Google users' search behavior for the search terms "glioblastoma" and "meningioma." Over the last 10 years, there has been an increasing interest in the topic "concussion" from Internet users in general and scientists. "Spinal stenosis," "concussion," and "vestibular schwannoma" are topics that are of special interest in high-income countries (eg, Germany), whereas "hydrocephalus" is a popular topic in low- and middle-income countries. The Google-defined top searches within these topic areas revealed more detail about people's interests (eg, "normal pressure hydrocephalus" or "football concussion" ranked among the most popular search queries within the corresponding topics). There was a similar volume of queries for "glioblastoma" and "meningioma." Google Trends is a useful source to elicit information about general trends in peoples' health interests and the role of different diseases across the world. The Internet presence of neurosurgical units and surgeons can be guided by online users' interests to achieve high-quality, professional-endorsed patient education. Copyright © 2016 Elsevier Inc. All rights reserved.
FGMReview: design of a knowledge management tool on female genital mutilation.
Martínez Pérez, Guillermo; Turetsky, Risa
2015-11-01
Web-based literature search engines may not be user-friendly for some readers searching for information on female genital mutilation. This is a traditional practice that has no health benefits, and about 140 million girls and women worldwide have undergone it. In 2012, the website FGMReview was created with the aim to offer a user-friendly, accessible, scalable, and innovative knowledge management tool specialized in female genital mutilation. The design of this website was guided by a conceptual model based on the use of benchmarking techniques and requirements engineering, an area of knowledge from the computer informatics field, influenced by the Transcultural Nursing model. The purpose of this article is to describe this conceptual model. Nurses and other health care providers can use this conceptual model to guide their methodological approach to design and launch other eHealth projects. © The Author(s) 2014.
Clinician search behaviors may be influenced by search engine design.
Lau, Annie Y S; Coiera, Enrico; Zrimec, Tatjana; Compton, Paul
2010-06-30
Searching the Web for documents using information retrieval systems plays an important part in clinicians' practice of evidence-based medicine. While much research focuses on the design of methods to retrieve documents, there has been little examination of the way different search engine capabilities influence clinician search behaviors. Previous studies have shown that use of task-based search engines allows for faster searches with no loss of decision accuracy compared with resource-based engines. We hypothesized that changes in search behaviors may explain these differences. In all, 75 clinicians (44 doctors and 31 clinical nurse consultants) were randomized to use either a resource-based or a task-based version of a clinical information retrieval system to answer questions about 8 clinical scenarios in a controlled setting in a university computer laboratory. Clinicians using the resource-based system could select 1 of 6 resources, such as PubMed; clinicians using the task-based system could select 1 of 6 clinical tasks, such as diagnosis. Clinicians in both systems could reformulate search queries. System logs unobtrusively capturing clinicians' interactions with the systems were coded and analyzed for clinicians' search actions and query reformulation strategies. The most frequent search action of clinicians using the resource-based system was to explore a new resource with the same query, that is, these clinicians exhibited a "breadth-first" search behaviour. Of 1398 search actions, clinicians using the resource-based system conducted 401 (28.7%, 95% confidence interval [CI] 26.37-31.11) in this way. In contrast, the majority of clinicians using the task-based system exhibited a "depth-first" search behavior in which they reformulated query keywords while keeping to the same task profiles. Of 585 search actions conducted by clinicians using the task-based system, 379 (64.8%, 95% CI 60.83-68.55) were conducted in this way. This study provides evidence that different search engine designs are associated with different user search behaviors.
Embedded systems engineering for products and services design.
Ahram, Tareq Z; Karwowski, Waldemar; Soares, Marcelo M
2012-01-01
Systems engineering (SE) professionals strive to develop new techniques to enhance the value of contributions to multidisciplinary smart product design teams. Products and services designers challenge themselves to search beyond the traditional design concept of addressing the physical, social, and cognitive factors. This paper covers the application of embedded user-centered systems engineering design practices into work processes based on the ISO 13407 framework [20] to support smart systems and services design and development. As practitioners collaborate to investigate alternative smart product designs, they concentrate on creating valuable products which will enhance positive interaction. This paper capitalizes on the need to follow a user-centered SE approach to smart products design [4, 22]. Products and systems intelligence should embrace a positive approach to user-centered design while improving our understanding of usable value-adding, experience and extending our knowledge of what inspires others to design enjoyable services and products.
What Can Pictures Tell Us About Web Pages? Improving Document Search Using Images.
Rodriguez-Vaamonde, Sergio; Torresani, Lorenzo; Fitzgibbon, Andrew W
2015-06-01
Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.
Wong, Paul Wai-Ching; Fu, King-Wa; Yau, Rickey Sai-Pong; Ma, Helen Hei-Man; Law, Yik-Wa; Chang, Shu-Sen; Yip, Paul Siu-Fai
2013-01-11
The Internet's potential impact on suicide is of major public health interest as easy online access to pro-suicide information or specific suicide methods may increase suicide risk among vulnerable Internet users. Little is known, however, about users' actual searching and browsing behaviors of online suicide-related information. To investigate what webpages people actually clicked on after searching with suicide-related queries on a search engine and to examine what queries people used to get access to pro-suicide websites. A retrospective observational study was done. We used a web search dataset released by America Online (AOL). The dataset was randomly sampled from all AOL subscribers' web queries between March and May 2006 and generated by 657,000 service subscribers. We found 5526 search queries (0.026%, 5526/21,000,000) that included the keyword "suicide". The 5526 search queries included 1586 different search terms and were generated by 1625 unique subscribers (0.25%, 1625/657,000). Of these queries, 61.38% (3392/5526) were followed by users clicking on a search result. Of these 3392 queries, 1344 (39.62%) webpages were clicked on by 930 unique users but only 1314 of those webpages were accessible during the study period. Each clicked-through webpage was classified into 11 categories. The categories of the most visited webpages were: entertainment (30.13%; 396/1314), scientific information (18.31%; 240/1314), and community resources (14.53%; 191/1314). Among the 1314 accessed webpages, we could identify only two pro-suicide websites. We found that the search terms used to access these sites included "commiting suicide with a gas oven", "hairless goat", "pictures of murder by strangulation", and "photo of a severe burn". A limitation of our study is that the database may be dated and confined to mainly English webpages. Searching or browsing suicide-related or pro-suicide webpages was uncommon, although a small group of users did access websites that contain detailed suicide method information.
Saparova, D; Belden, J; Williams, J; Richardson, B; Schuster, K
2014-01-01
Federated medical search engines are health information systems that provide a single access point to different types of information. Their efficiency as clinical decision support tools has been demonstrated through numerous evaluations. Despite their rigor, very few of these studies report holistic evaluations of medical search engines and even fewer base their evaluations on existing evaluation frameworks. To evaluate a federated medical search engine, MedSocket, for its potential net benefits in an established clinical setting. This study applied the Human, Organization, and Technology (HOT-fit) evaluation framework in order to evaluate MedSocket. The hierarchical structure of the HOT-factors allowed for identification of a combination of efficiency metrics. Human fit was evaluated through user satisfaction and patterns of system use; technology fit was evaluated through the measurements of time-on-task and the accuracy of the found answers; and organization fit was evaluated from the perspective of system fit to the existing organizational structure. Evaluations produced mixed results and suggested several opportunities for system improvement. On average, participants were satisfied with MedSocket searches and confident in the accuracy of retrieved answers. However, MedSocket did not meet participants' expectations in terms of download speed, access to information, and relevance of the search results. These mixed results made it necessary to conclude that in the case of MedSocket, technology fit had a significant influence on the human and organization fit. Hence, improving technological capabilities of the system is critical before its net benefits can become noticeable. The HOT-fit evaluation framework was instrumental in tailoring the methodology for conducting a comprehensive evaluation of the search engine. Such multidimensional evaluation of the search engine resulted in recommendations for system improvement.
Evaluating a Federated Medical Search Engine
Belden, J.; Williams, J.; Richardson, B.; Schuster, K.
2014-01-01
Summary Background Federated medical search engines are health information systems that provide a single access point to different types of information. Their efficiency as clinical decision support tools has been demonstrated through numerous evaluations. Despite their rigor, very few of these studies report holistic evaluations of medical search engines and even fewer base their evaluations on existing evaluation frameworks. Objectives To evaluate a federated medical search engine, MedSocket, for its potential net benefits in an established clinical setting. Methods This study applied the Human, Organization, and Technology (HOT-fit) evaluation framework in order to evaluate MedSocket. The hierarchical structure of the HOT-factors allowed for identification of a combination of efficiency metrics. Human fit was evaluated through user satisfaction and patterns of system use; technology fit was evaluated through the measurements of time-on-task and the accuracy of the found answers; and organization fit was evaluated from the perspective of system fit to the existing organizational structure. Results Evaluations produced mixed results and suggested several opportunities for system improvement. On average, participants were satisfied with MedSocket searches and confident in the accuracy of retrieved answers. However, MedSocket did not meet participants’ expectations in terms of download speed, access to information, and relevance of the search results. These mixed results made it necessary to conclude that in the case of MedSocket, technology fit had a significant influence on the human and organization fit. Hence, improving technological capabilities of the system is critical before its net benefits can become noticeable. Conclusions The HOT-fit evaluation framework was instrumental in tailoring the methodology for conducting a comprehensive evaluation of the search engine. Such multidimensional evaluation of the search engine resulted in recommendations for system improvement. PMID:25298813
Pross, Christoph; Averdunk, Lars-Henrik; Stjepanovic, Josip; Busse, Reinhard; Geissler, Alexander
2017-04-21
Quality of care public reporting provides structural, process and outcome information to facilitate hospital choice and strengthen quality competition. Yet, evidence indicates that patients rarely use this information in their decision-making, due to limited awareness of the data and complex and conflicting information. While there is enthusiasm among policy makers for public reporting, clinicians and researchers doubt its overall impact. Almost no study has analyzed how users behave on public reporting portals, which information they seek out and when they abort their search. This study employs web-usage mining techniques on server log data of 17 million user actions from Germany's premier provider transparency portal Weisse-Liste.de (WL.de) between 2012 and 2015. Postal code and ICD search requests facilitate identification of geographical and treatment area usage patterns. User clustering helps to identify user types based on parameters like session length, referrer and page topic visited. First-level markov chains illustrate common click paths and premature exits. In 2015, the WL.de Hospital Search portal had 2,750 daily users, with 25% mobile traffic, a bounce rate of 38% and 48% of users examining hospital quality information. From 2013 to 2015, user traffic grew at 38% annually. On average users spent 7 min on the portal, with 7.4 clicks and 54 s between clicks. Users request information for many oncologic and orthopedic conditions, for which no process or outcome quality indicators are available. Ten distinct user types, with particular usage patterns and interests, are identified. In particular, the different types of professional and non-professional users need to be addressed differently to avoid high premature exit rates at several key steps in the information search and view process. Of all users, 37% enter hospital information correctly upon entry, while 47% require support in their hospital search. Several onsite and offsite improvement options are identified. Public reporting needs to be directed at the interests of its users, with more outcome quality information for oncology and orthopedics. Customized reporting can cater to the different needs and skill levels of professional and non-professional users. Search engine optimization and hospital quality advocacy can increase website traffic.
PubMed searches: overview and strategies for clinicians.
Lindsey, Wesley T; Olin, Bernie R
2013-04-01
PubMed is a biomedical and life sciences database maintained by a division of the National Library of Medicine known as the National Center for Biotechnology Information (NCBI). It is a large resource with more than 5600 journals indexed and greater than 22 million total citations. Searches conducted in PubMed provide references that are more specific for the intended topic compared with other popular search engines. Effective PubMed searches allow the clinician to remain current on the latest clinical trials, systematic reviews, and practice guidelines. PubMed continues to evolve by allowing users to create a customized experience through the My NCBI portal, new arrangements and options in search filters, and supporting scholarly projects through exportation of citations to reference managing software. Prepackaged search options available in the Clinical Queries feature also allow users to efficiently search for clinical literature. PubMed also provides information regarding the source journals themselves through the Journals in NCBI Databases link. This article provides an overview of the PubMed database's structure and features as well as strategies for conducting an effective search.
Query Log Analysis of an Electronic Health Record Search Engine
Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.
2011-01-01
We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150
Problems of information support in scientific research
NASA Astrophysics Data System (ADS)
Shamaev, V. G.; Gorshkov, A. B.
2015-11-01
This paper reports on the creation of the open access Akustika portal (AKDATA.RU) designed to provide Russian-language easy-to-read and search information on acoustics and related topics. The absence of a Russian-language publication in foreign databases means that it is effectively lost for much of the scientific community. The portal has three interrelated sections: the Akustika information search system (ISS) (Acoustics), full-text archive of the Akusticheskii Zhurnal (Acoustic Journal), and 'Signal'naya informatsiya' ('Signaling information') on acoustics. The paper presents a description of the Akustika ISS, including its structure, content, interface, and information search capabilities for basic and applied research in diverse areas of science, engineering, biology, medicine, etc. The intended users of the portal are physicists, engineers, and engineering technologists interested in expanding their research activities and seeking to increase their knowledge base. Those studying current trends in the Russian-language contribution to international science may also find the portal useful.
Speeding up the screening of steroids in urine: development of a user-friendly library.
Galesio, M; López-Fdez, H; Reboiro-Jato, M; Gómez-Meire, Silvana; Glez-Peña, D; Fdez-Riverola, F; Lodeiro, Carlos; Diniz, M E; Capelo, J L
2013-12-11
This work presents a novel database search engine - MLibrary - designed to assist the user in the detection and identification of androgenic anabolic steroids (AAS) and its metabolites by matrix assisted laser desorption/ionization (MALDI) and mass spectrometry-based strategies. The detection of the AAS in the samples was accomplished by searching (i) the mass spectrometric (MS) spectra against the library developed to identify possible positives and (ii) by comparison of the tandem mass spectrometric (MS/MS) spectra produced after fragmentation of the possible positives with a complete set of spectra that have previously been assigned to the software. The urinary screening for anabolic agents plays a major role in anti-doping laboratories as they represent the most abused drug class in sports. With the help of the MLibrary software application, the use of MALDI techniques for doping control is simplified and the time for evaluation and interpretation of the results is reduced. To do so, the search engine takes as input several MALDI-TOF-MS and MALDI-TOF-MS/MS spectra. It aids the researcher in an automatic mode by identifying possible positives in a single MS analysis and then confirming their presence in tandem MS analysis by comparing the experimental tandem mass spectrometric data with the database. Furthermore, the search engine can, potentially, be further expanded to other compounds in addition to AASs. The applicability of the MLibrary tool is shown through the analysis of spiked urine samples. Copyright © 2013 Elsevier Inc. All rights reserved.
A Semantic Approach for Knowledge Discovery to Help Mitigate Habitat Loss in the Gulf of Mexico
NASA Astrophysics Data System (ADS)
Ramachandran, R.; Maskey, M.; Graves, S.; Hardin, D.
2008-12-01
Noesis is a meta-search engine and a resource aggregator that uses domain ontologies to provide scoped search capabilities. Ontologies enable Noesis to help users refine their searches for information on the open web and in hidden web locations such as data catalogues with standardized, but discipline specific vocabularies. Through its ontologies Noesis provides a guided refinement of search queries which produces complete and accurate searches while reducing the user's burden to experiment with different search strings. All search results are organized by categories (e. g. all results from Google are grouped together) which may be selected or omitted according to the desire of the user. During the past two years ontologies were developed for sea grasses in the Gulf of Mexico and were used to support a habitat restoration demonstration project. Currently these ontologies are being augmented to address the special characteristics of mangroves. These new ontologies will extend the demonstration project to broader regions of the Gulf including protected mangrove locations in coastal Mexico. Noesis contributes to the decision making process by producing a comprehensive list of relevant resources based on the semantic information contained in the ontologies. Ontologies are organized in a tree like taxonomies, where the child nodes represent the Specializations and the parent nodes represent the Generalizations of a node or concept. Specializations can be used to provide more detailed search, while generalizations are used to make the search broader. Ontologies are also used to link two syntactically different terms to one semantic concept (synonyms). Appending a synonym to the query expands the search, thus providing better search coverage. Every concept has a set of properties that are neither in the same inheritance hierarchy (Specializations / Generalizations) nor equivalent (synonyms). These are called Related Concepts and they are captured in the ontology through property relationships. By using Related Concepts users can search for resources with respect to a particular property. Noesis automatically generates searches that include all of these capabilities, removing the burden from the user and producing broader and more accurate search results. This presentation will demonstrate the features of Noesis and describe its application to habitat studies in the Gulf of Mexico.
Multi-source and ontology-based retrieval engine for maize mutant phenotypes
Green, Jason M.; Harnsomburana, Jaturon; Schaeffer, Mary L.; Lawrence, Carolyn J.; Shyu, Chi-Ren
2011-01-01
Model Organism Databases, including the various plant genome databases, collect and enable access to massive amounts of heterogeneous information, including sequence data, gene product information, images of mutant phenotypes, etc, as well as textual descriptions of many of these entities. While a variety of basic browsing and search capabilities are available to allow researchers to query and peruse the names and attributes of phenotypic data, next-generation search mechanisms that allow querying and ranking of text descriptions are much less common. In addition, the plant community needs an innovative way to leverage the existing links in these databases to search groups of text descriptions simultaneously. Furthermore, though much time and effort have been afforded to the development of plant-related ontologies, the knowledge embedded in these ontologies remains largely unused in available plant search mechanisms. Addressing these issues, we have developed a unique search engine for mutant phenotypes from MaizeGDB. This advanced search mechanism integrates various text description sources in MaizeGDB to aid a user in retrieving desired mutant phenotype information. Currently, descriptions of mutant phenotypes, loci and gene products are utilized collectively for each search, though expansion of the search mechanism to include other sources is straightforward. The retrieval engine, to our knowledge, is the first engine to exploit the content and structure of available domain ontologies, currently the Plant and Gene Ontologies, to expand and enrich retrieval results in major plant genomic databases. Database URL: http:www.PhenomicsWorld.org/QBTA.php PMID:21558151
The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections
Epstein, Robert; Robertson, Ronald E.
2015-01-01
Internet search rankings have a significant impact on consumer choices, mainly because users trust and choose higher-ranked results more than lower-ranked results. Given the apparent power of search rankings, we asked whether they could be manipulated to alter the preferences of undecided voters in democratic elections. Here we report the results of five relevant double-blind, randomized controlled experiments, using a total of 4,556 undecided voters representing diverse demographic characteristics of the voting populations of the United States and India. The fifth experiment is especially notable in that it was conducted with eligible voters throughout India in the midst of India’s 2014 Lok Sabha elections just before the final votes were cast. The results of these experiments demonstrate that (i) biased search rankings can shift the voting preferences of undecided voters by 20% or more, (ii) the shift can be much higher in some demographic groups, and (iii) search ranking bias can be masked so that people show no awareness of the manipulation. We call this type of influence, which might be applicable to a variety of attitudes and beliefs, the search engine manipulation effect. Given that many elections are won by small margins, our results suggest that a search engine company has the power to influence the results of a substantial number of elections with impunity. The impact of such manipulations would be especially large in countries dominated by a single search engine company. PMID:26243876
The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections.
Epstein, Robert; Robertson, Ronald E
2015-08-18
Internet search rankings have a significant impact on consumer choices, mainly because users trust and choose higher-ranked results more than lower-ranked results. Given the apparent power of search rankings, we asked whether they could be manipulated to alter the preferences of undecided voters in democratic elections. Here we report the results of five relevant double-blind, randomized controlled experiments, using a total of 4,556 undecided voters representing diverse demographic characteristics of the voting populations of the United States and India. The fifth experiment is especially notable in that it was conducted with eligible voters throughout India in the midst of India's 2014 Lok Sabha elections just before the final votes were cast. The results of these experiments demonstrate that (i) biased search rankings can shift the voting preferences of undecided voters by 20% or more, (ii) the shift can be much higher in some demographic groups, and (iii) search ranking bias can be masked so that people show no awareness of the manipulation. We call this type of influence, which might be applicable to a variety of attitudes and beliefs, the search engine manipulation effect. Given that many elections are won by small margins, our results suggest that a search engine company has the power to influence the results of a substantial number of elections with impunity. The impact of such manipulations would be especially large in countries dominated by a single search engine company.
Strategies for Information Retrieval and Virtual Teaming to Mitigate Risk on NASA's Missions
NASA Technical Reports Server (NTRS)
Topousis, Daria; Williams, Gregory; Murphy, Keri
2007-01-01
Following the loss of NASA's Space Shuttle Columbia in 2003, it was determined that problems in the agency's organization created an environment that led to the accident. One component of the proposed solution resulted in the formation of the NASA Engineering Network (NEN), a suite of information retrieval and knowledge sharing tools. This paper describes the implementation of this set of search, portal, content management, and semantic technologies, including a unique meta search capability for data from distributed engineering resources. NEN's communities of practice are formed along engineering disciplines where users leverage their knowledge and best practices to collaborate and take informal learning back to their personal jobs and embed it into the procedures of the agency. These results offer insight into using traditional engineering disciplines for virtual teaming and problem solving.
Alor-Hernández, Giner; Pérez-Gallardo, Yuliana; Posada-Gómez, Rubén; Cortes-Robles, Guillermo; Rodríguez-González, Alejandro; Aguilar-Laserre, Alberto A
2012-09-01
Nowadays, traditional search engines such as Google, Yahoo and Bing facilitate the retrieval of information in the format of images, but the results are not always useful for the users. This is mainly due to two problems: (1) the semantic keywords are not taken into consideration and (2) it is not always possible to establish a query using the image features. This issue has been covered in different domains in order to develop content-based image retrieval (CBIR) systems. The expert community has focussed their attention on the healthcare domain, where a lot of visual information for medical analysis is available. This paper provides a solution called iPixel Visual Search Engine, which involves semantics and content issues in order to search for digitized mammograms. iPixel offers the possibility of retrieving mammogram features using collective intelligence and implementing a CBIR algorithm. Our proposal compares not only features with similar semantic meaning, but also visual features. In this sense, the comparisons are made in different ways: by the number of regions per image, by maximum and minimum size of regions per image and by average intensity level of each region. iPixel Visual Search Engine supports the medical community in differential diagnoses related to the diseases of the breast. The iPixel Visual Search Engine has been validated by experts in the healthcare domain, such as radiologists, in addition to experts in digital image analysis.
Semantically optiMize the dAta seRvice operaTion (SMART) system for better data discovery and access
NASA Astrophysics Data System (ADS)
Yang, C.; Huang, T.; Armstrong, E. M.; Moroni, D. F.; Liu, K.; Gui, Z.
2013-12-01
Abstract: We present a Semantically optiMize the dAta seRvice operaTion (SMART) system for better data discovery and access across the NASA data systems, Global Earth Observation System of Systems (GEOSS) Clearinghouse and Data.gov to facilitate scientists to select Earth observation data that fit better their needs in four aspects: 1. Integrating and interfacing the SMART system to include the functionality of a) semantic reasoning based on Jena, an open source semantic reasoning engine, b) semantic similarity calculation, c) recommendation based on spatiotemporal, semantic, and user workflow patterns, and d) ranking results based on similarity between search terms and data ontology. 2. Collaborating with data user communities to a) capture science data ontology and record relevant ontology triple stores, b) analyze and mine user search and download patterns, c) integrate SMART into metadata-centric discovery system for community-wide usage and feedback, and d) customizing data discovery, search and access user interface to include the ranked results, recommendation components, and semantic based navigations. 3. Laying the groundwork to interface the SMART system with other data search and discovery systems as an open source data search and discovery solution. The SMART systems leverages NASA, GEO, FGDC data discovery, search and access for the Earth science community by enabling scientists to readily discover and access data appropriate to their endeavors, increasing the efficiency of data exploration and decreasing the time that scientists must spend on searching, downloading, and processing the datasets most applicable to their research. By incorporating the SMART system, it is a likely aim that the time being devoted to discovering the most applicable dataset will be substantially reduced, thereby reducing the number of user inquiries and likewise reducing the time and resources expended by a data center in addressing user inquiries. Keywords: EarthCube; ECHO, DAACs, GeoPlatform; Geospatial Cyberinfrastructure References: 1. Yang, P., Evans, J., Cole, M., Alameh, N., Marley, S., & Bambacus, M., (2007). The Emerging Concepts and Applications of the Spatial Web Portal. Photogrammetry Engineering &Remote Sensing,73(6):691-698. 2. Zhang, C, Zhao, T. and W. Li. (2010). The Framework of a Geospatial Semantic Web based Spatial Decision Support System for Digital Earth. International Journal of Digital Earth. 3(2):111-134. 3. Yang C., Raskin R., Goodchild M.F., Gahegan M., 2010, Geospatial Cyberinfrastructure: Past, Present and Future,Computers, Environment, and Urban Systems, 34(4):264-277. 4. Liu K., Yang C., Li W., Gui Z., Xu C., Xia J., 2013. Using ontology and similarity calculations to rank Earth science data searching results, International Journal of Geospatial Information Applications. (in press)
G-Bean: an ontology-graph based web tool for biomedical literature retrieval
2014-01-01
Background Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. Methods G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Results Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. Conclusions G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user. PMID:25474588
G-Bean: an ontology-graph based web tool for biomedical literature retrieval.
Wang, James Z; Zhang, Yuanyuan; Dong, Liang; Li, Lin; Srimani, Pradip K; Yu, Philip S
2014-01-01
Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.
GenderMedDB: an interactive database of sex and gender-specific medical literature.
Oertelt-Prigione, Sabine; Gohlke, Björn-Oliver; Dunkel, Mathias; Preissner, Robert; Regitz-Zagrosek, Vera
2014-01-01
Searches for sex and gender-specific publications are complicated by the absence of a specific algorithm within search engines and by the lack of adequate archives to collect the retrieved results. We previously addressed this issue by initiating the first systematic archive of medical literature containing sex and/or gender-specific analyses. This initial collection has now been greatly enlarged and re-organized as a free user-friendly database with multiple functions: GenderMedDB (http://gendermeddb.charite.de). GenderMedDB retrieves the included publications from the PubMed database. Manuscripts containing sex and/or gender-specific analysis are continuously screened and the relevant findings organized systematically into disciplines and diseases. Publications are furthermore classified by research type, subject and participant numbers. More than 11,000 abstracts are currently included in the database, after screening more than 40,000 publications. The main functions of the database include searches by publication data or content analysis based on pre-defined classifications. In addition, registrants are enabled to upload relevant publications, access descriptive publication statistics and interact in an open user forum. Overall, GenderMedDB offers the advantages of a discipline-specific search engine as well as the functions of a participative tool for the gender medicine community.
Sander, U; Emmert, M; Grobe, T G
2013-06-01
The Internet provides ways for patients to obtain information about doctors. The study poses the question whether it is possible and how long it takes to find a suitable doctor with an Internet search. It focuses on the effectiveness and efficiency of the search. Specialised physician rating and searching portals and Google are analysed when used to solve specific tasks. The behaviour of volunteers when searching a suitable ophthalmologist, dermatologist or dentist was observed in a usability lab. Additionally, interviews were carried out by means of structured questionnaires to measure the satisfaction of the users with the search and their results. Three physician rating and searching portals that are frequently used in Germany (Jameda.de, DocInsider.de and Arztauskunft.de) were analysed as well as Google. When using Arztauskunft and Google most users found an appropriate physician. When using Docinsider or Jameda they found fewer doctors. Additionally, the time needed to locate a suitable doctor when using Docinsider and Jameda was higher compared to the time needed when using the Arztauskunft and Google. The satisfaction of users who used Google was significantly higher in comparison to those who used the specialised physician rating and searching portals. It emerged from this study that there is no added value when using specialised physician rating and searching portals compared to using the search engine Google when trying to find a doctor having a particular specialty. The usage of several searching portals is recommended to identify as many suitable doctors as possible. © Georg Thieme Verlag KG Stuttgart · New York.
Semantic Features for Classifying Referring Search Terms
DOE Office of Scientific and Technical Information (OSTI.GOV)
May, Chandler J.; Henry, Michael J.; McGrath, Liam R.
2012-05-11
When an internet user clicks on a result in a search engine, a request is submitted to the destination web server that includes a referrer field containing the search terms given by the user. Using this information, website owners can analyze the search terms leading to their websites to better understand their visitors needs. This work explores some of the features that can be used for classification-based analysis of such referring search terms. We present initial results for the example task of classifying HTTP requests countries of origin. A system that can accurately predict the country of origin from querymore » text may be a valuable complement to IP lookup methods which are susceptible to the obfuscation of dereferrers or proxies. We suggest that the addition of semantic features improves classifier performance in this example application. We begin by looking at related work and presenting our approach. After describing initial experiments and results, we discuss paths forward for this work.« less
Automatic Figure Ranking and User Interfacing for Intelligent Figure Search
Yu, Hong; Liu, Feifan; Ramesh, Balaji Polepalli
2010-01-01
Background Figures are important experimental results that are typically reported in full-text bioscience articles. Bioscience researchers need to access figures to validate research facts and to formulate or to test novel research hypotheses. On the other hand, the sheer volume of bioscience literature has made it difficult to access figures. Therefore, we are developing an intelligent figure search engine (http://figuresearch.askhermes.org). Existing research in figure search treats each figure equally, but we introduce a novel concept of “figure ranking”: figures appearing in a full-text biomedical article can be ranked by their contribution to the knowledge discovery. Methodology/Findings We empirically validated the hypothesis of figure ranking with over 100 bioscience researchers, and then developed unsupervised natural language processing (NLP) approaches to automatically rank figures. Evaluating on a collection of 202 full-text articles in which authors have ranked the figures based on importance, our best system achieved a weighted error rate of 0.2, which is significantly better than several other baseline systems we explored. We further explored a user interfacing application in which we built novel user interfaces (UIs) incorporating figure ranking, allowing bioscience researchers to efficiently access important figures. Our evaluation results show that 92% of the bioscience researchers prefer as the top two choices the user interfaces in which the most important figures are enlarged. With our automatic figure ranking NLP system, bioscience researchers preferred the UIs in which the most important figures were predicted by our NLP system than the UIs in which the most important figures were randomly assigned. In addition, our results show that there was no statistical difference in bioscience researchers' preference in the UIs generated by automatic figure ranking and UIs by human ranking annotation. Conclusion/Significance The evaluation results conclude that automatic figure ranking and user interfacing as we reported in this study can be fully implemented in online publishing. The novel user interface integrated with the automatic figure ranking system provides a more efficient and robust way to access scientific information in the biomedical domain, which will further enhance our existing figure search engine to better facilitate accessing figures of interest for bioscientists. PMID:20949102
GeneView: a comprehensive semantic search engine for PubMed.
Thomas, Philippe; Starlinger, Johannes; Vowinkel, Alexander; Arzt, Sebastian; Leser, Ulf
2012-07-01
Research results are primarily published in scientific literature and curation efforts cannot keep up with the rapid growth of published literature. The plethora of knowledge remains hidden in large text repositories like MEDLINE. Consequently, life scientists have to spend a great amount of time searching for specific information. The enormous ambiguity among most names of biomedical objects such as genes, chemicals and diseases often produces too large and unspecific search results. We present GeneView, a semantic search engine for biomedical knowledge. GeneView is built upon a comprehensively annotated version of PubMed abstracts and openly available PubMed Central full texts. This semi-structured representation of biomedical texts enables a number of features extending classical search engines. For instance, users may search for entities using unique database identifiers or they may rank documents by the number of specific mentions they contain. Annotation is performed by a multitude of state-of-the-art text-mining tools for recognizing mentions from 10 entity classes and for identifying protein-protein interactions. GeneView currently contains annotations for >194 million entities from 10 classes for ∼21 million citations with 271,000 full text bodies. GeneView can be searched at http://bc3.informatik.hu-berlin.de/.
From Internet User to Cyberspace Citizen.
ERIC Educational Resources Information Center
Wakabayashi, Ippei
1997-01-01
Discusses social and cultural challenges that Internet technology raises. Highlights include preserving the freedom in cyberspace, the information distribution scheme of the Internet, two-way interactivity, search engines as marketing tools, the insecurity of cyberspace, online safety rules for children, educating children to "walk…
Multimedia proceedings of the 10th Office Information Technology Conference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hudson, B.
1993-09-10
The CD contains the handouts for all the speakers, demo software from Apple, Adobe, Microsoft, and Zylabs, and video movies of the keynote speakers. Adobe Acrobat is used to provide full-fidelity retrieval of the speakers` slides and Apple`s Quicktime for Macintosh and Windows is used for video playback. ZyIndex is included for Windows users to provide a full-text search engine for selected documents. There are separately labelled installation and operating instructions for Macintosh and Windows users and some general materials common to both sets of users.
User needs analysis and usability assessment of DataMed - a biomedical data discovery index.
Dixit, Ram; Rogith, Deevakar; Narayana, Vidya; Salimi, Mandana; Gururaj, Anupama; Ohno-Machado, Lucila; Xu, Hua; Johnson, Todd R
2017-11-30
To present user needs and usability evaluations of DataMed, a Data Discovery Index (DDI) that allows searching for biomedical data from multiple sources. We conducted 2 phases of user studies. Phase 1 was a user needs analysis conducted before the development of DataMed, consisting of interviews with researchers. Phase 2 involved iterative usability evaluations of DataMed prototypes. We analyzed data qualitatively to document researchers' information and user interface needs. Biomedical researchers' information needs in data discovery are complex, multidimensional, and shaped by their context, domain knowledge, and technical experience. User needs analyses validate the need for a DDI, while usability evaluations of DataMed show that even though aggregating metadata into a common search engine and applying traditional information retrieval tools are promising first steps, there remain challenges for DataMed due to incomplete metadata and the complexity of data discovery. Biomedical data poses distinct problems for search when compared to websites or publications. Making data available is not enough to facilitate biomedical data discovery: new retrieval techniques and user interfaces are necessary for dataset exploration. Consistent, complete, and high-quality metadata are vital to enable this process. While available data and researchers' information needs are complex and heterogeneous, a successful DDI must meet those needs and fit into the processes of biomedical researchers. Research directions include formalizing researchers' information needs, standardizing overviews of data to facilitate relevance judgments, implementing user interfaces for concept-based searching, and developing evaluation methods for open-ended discovery systems such as DDIs. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Web-based Electronic Sharing and RE-allocation of Assets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leverett, Dave; Miller, Robert A.; Berlin, Gary J.
2002-09-09
The Electronic Asses Sharing Program is a web-based application that provides the capability for complex-wide sharing and reallocation of assets that are excess, under utilized, or un-utilized. through a web-based fron-end and supporting has database with a search engine, users can search for assets that they need, search for assets needed by others, enter assets they need, and enter assets they have available for reallocation. In addition, entire listings of available assets and needed assets can be viewed. The application is written in Java, the hash database and search engine are in Object-oriented Java Database Management (OJDBM). The application willmore » be hosted on an SRS-managed server outside the Firewall and access will be controlled via a protected realm. An example of the application can be viewed at the followinig (temporary) URL: http://idgdev.srs.gov/servlet/srs.weshare.WeShare« less
Haim, Mario; Arendt, Florian; Scherr, Sebastian
2017-02-01
Despite evidence that suicide rates can increase after suicides are widely reported in the media, appropriate depictions of suicide in the media can help people to overcome suicidal crises and can thus elicit preventive effects. We argue on the level of individual media users that a similar ambivalence can be postulated for search results on online suicide-related search queries. Importantly, the filter bubble hypothesis (Pariser, 2011) states that search results are biased by algorithms based on a person's previous search behavior. In this study, we investigated whether suicide-related search queries, including either potentially suicide-preventive or -facilitative terms, influence subsequent search results. This might thus protect or harm suicidal Internet users. We utilized a 3 (search history: suicide-related harmful, suicide-related helpful, and suicide-unrelated) × 2 (reactive: clicking the top-most result link and no clicking) experimental design applying agent-based testing. While findings show no influences either of search histories or of reactivity on search results in a subsequent situation, the presentation of a helpline offer raises concerns about possible detrimental algorithmic decision-making: Algorithms "decided" whether or not to present a helpline, and this automated decision, then, followed the agent throughout the rest of the observation period. Implications for policy-making and search providers are discussed.
2004-09-01
No matter how good a site's navigational tools, site visitors will not use them if the menu categories are ambiguous. Users have to know what to expect when they click on a particular menu item. If the categories are not intuitive, users will have to resort to the site's search engine, ignoring the entire structure. The Pennsylvania Medical Society site (http://www.pamedsoc.org) had been plagued with poor menu labels until it took a step back and improved them.
The National Solar Observatory Digital Library - a resource for space weather studies
NASA Astrophysics Data System (ADS)
Hill, F.; Erdwurm, W.; Branston, D.; McGraw, R.
2000-09-01
We describe the National Solar Observatory Digital Library (NSODL), consisting of 200GB of on-line archived solar data, a RDBMS search engine, and an Internet HTML-form user interface. The NSODL is open to all users and provides simple access to solar physics data of basic importance for space weather research and forecasting, heliospheric research, and education. The NSODL can be accessed at the URL www.nso.noao.edu/diglib.
Patient-Centered Tools for Medication Information Search
Wilcox, Lauren; Feiner, Steven; Elhadad, Noémie; Vawdrey, David; Tran, Tran H.
2016-01-01
Recent research focused on online health information seeking highlights a heavy reliance on general-purpose search engines. However, current general-purpose search interfaces do not necessarily provide adequate support for non-experts in identifying suitable sources of health information. Popular search engines have recently introduced search tools in their user interfaces for a range of topics. In this work, we explore how such tools can support non-expert, patient-centered health information search. Scoping the current work to medication-related search, we report on findings from a formative study focused on the design of patient-centered, medication-information search tools. Our study included qualitative interviews with patients, family members, and domain experts, as well as observations of their use of Remedy, a technology probe embodying a set of search tools. Post-operative cardiothoracic surgery patients and their visiting family members used the tools to find information about their hospital medications and were interviewed before and after their use. Domain experts conducted similar search tasks and provided qualitative feedback on their preferences and recommendations for designing these tools. Findings from our study suggest the importance of four valuation principles underlying our tools: credibility, readability, consumer perspective, and topical relevance. PMID:28163972
Patient-Centered Tools for Medication Information Search.
Wilcox, Lauren; Feiner, Steven; Elhadad, Noémie; Vawdrey, David; Tran, Tran H
2014-05-20
Recent research focused on online health information seeking highlights a heavy reliance on general-purpose search engines. However, current general-purpose search interfaces do not necessarily provide adequate support for non-experts in identifying suitable sources of health information. Popular search engines have recently introduced search tools in their user interfaces for a range of topics. In this work, we explore how such tools can support non-expert, patient-centered health information search. Scoping the current work to medication-related search, we report on findings from a formative study focused on the design of patient-centered, medication-information search tools. Our study included qualitative interviews with patients, family members, and domain experts, as well as observations of their use of Remedy, a technology probe embodying a set of search tools. Post-operative cardiothoracic surgery patients and their visiting family members used the tools to find information about their hospital medications and were interviewed before and after their use. Domain experts conducted similar search tasks and provided qualitative feedback on their preferences and recommendations for designing these tools. Findings from our study suggest the importance of four valuation principles underlying our tools: credibility, readability, consumer perspective, and topical relevance.
Labrecque, Michel; Ratté, Stéphane; Frémont, Pierre; Cauchon, Michel; Ouellet, Jérôme; Hogg, William; McGowan, Jessie; Gagnon, Marie-Pierre; Njoya, Merlin; Légaré, France
2013-10-01
To compare the ability of users of 2 medical search engines, InfoClinique and the Trip database, to provide correct answers to clinical questions and to explore the perceived effects of the tools on the clinical decision-making process. Randomized trial. Three family medicine units of the family medicine program of the Faculty of Medicine at Laval University in Quebec city, Que. Fifteen second-year family medicine residents. Residents generated 30 structured questions about therapy or preventive treatment (2 questions per resident) based on clinical encounters. Using an Internet platform designed for the trial, each resident answered 20 of these questions (their own 2, plus 18 of the questions formulated by other residents, selected randomly) before and after searching for information with 1 of the 2 search engines. For each question, 5 residents were randomly assigned to begin their search with InfoClinique and 5 with the Trip database. The ability of residents to provide correct answers to clinical questions using the search engines, as determined by third-party evaluation. After answering each question, participants completed a questionnaire to assess their perception of the engine's effect on the decision-making process in clinical practice. Of 300 possible pairs of answers (1 answer before and 1 after the initial search), 254 (85%) were produced by 14 residents. Of these, 132 (52%) and 122 (48%) pairs of answers concerned questions that had been assigned an initial search with InfoClinique and the Trip database, respectively. Both engines produced an important and similar absolute increase in the proportion of correct answers after searching (26% to 62% for InfoClinique, for an increase of 36%; 24% to 63% for the Trip database, for an increase of 39%; P = .68). For all 30 clinical questions, at least 1 resident produced the correct answer after searching with either search engine. The mean (SD) time of the initial search for each question was 23.5 (7.6) minutes with InfoClinique and 22.3 (7.8) minutes with the Trip database (P = .30). Participants' perceptions of each engine's effect on the decision-making process were very positive and similar for both search engines. Family medicine residents' ability to provide correct answers to clinical questions increased dramatically and similarly with the use of both InfoClinique and the Trip database. These tools have strong potential to increase the quality of medical care.
Edelstein, Michael; Wallensten, Anders; Zetterqvist, Inga; Hulth, Anette
2014-01-01
Norovirus outbreaks severely disrupt healthcare systems. We evaluated whether Websök, an internet-based surveillance system using search engine data, improved norovirus surveillance and response in Sweden. We compared Websök users' characteristics with the general population, cross-correlated weekly Websök searches with laboratory notifications between 2006 and 2013, compared the time Websök and laboratory data crossed the epidemic threshold and surveyed infection control teams about their perception and use of Websök. Users of Websök were not representative of the general population. Websök correlated with laboratory data (b = 0.88-0.89) and gave an earlier signal to the onset of the norovirus season compared with laboratory-based surveillance. 17/21 (81%) infection control teams answered the survey, of which 11 (65%) believed Websök could help with infection control plans. Websök is a low-resource, easily replicable system that detects the norovirus season as reliably as laboratory data, but earlier. Using Websök in routine surveillance can help infection control teams prepare for the yearly norovirus season. PMID:24955857
Edelstein, Michael; Wallensten, Anders; Zetterqvist, Inga; Hulth, Anette
2014-01-01
Norovirus outbreaks severely disrupt healthcare systems. We evaluated whether Websök, an internet-based surveillance system using search engine data, improved norovirus surveillance and response in Sweden. We compared Websök users' characteristics with the general population, cross-correlated weekly Websök searches with laboratory notifications between 2006 and 2013, compared the time Websök and laboratory data crossed the epidemic threshold and surveyed infection control teams about their perception and use of Websök. Users of Websök were not representative of the general population. Websök correlated with laboratory data (b = 0.88-0.89) and gave an earlier signal to the onset of the norovirus season compared with laboratory-based surveillance. 17/21 (81%) infection control teams answered the survey, of which 11 (65%) believed Websök could help with infection control plans. Websök is a low-resource, easily replicable system that detects the norovirus season as reliably as laboratory data, but earlier. Using Websök in routine surveillance can help infection control teams prepare for the yearly norovirus season.
1996-10-01
aligned using an octree search algorithm combined with cross correlation analysis . Successive 4x downsampling with optional and specifiable neighborhood...desired and the search engine embedded in the OODBMS will find the requested imagery and que it to the user for further analysis . This application was...obtained during Hoftmann-LaRoche production pathology imaging performed at UMICH. Versant works well and is easy to use; 3) Pathology Image Analysis
What do web-use skill differences imply for online health information searches?
Feufel, Markus A; Stahl, S Frederica
2012-06-13
Online health information is of variable and often low scientific quality. In particular, elderly less-educated populations are said to struggle in accessing quality online information (digital divide). Little is known about (1) how their online behavior differs from that of younger, more-educated, and more-frequent Web users, and (2) how the older population may be supported in accessing good-quality online health information. To specify the digital divide between skilled and less-skilled Web users, we assessed qualitative differences in technical skills, cognitive strategies, and attitudes toward online health information. Based on these findings, we identified educational and technological interventions to help Web users find and access good-quality online health information. We asked 22 native German-speaking adults to search for health information online. The skilled cohort consisted of 10 participants who were younger than 30 years of age, had a higher level of education, and were more experienced using the Web than 12 participants in the less-skilled cohort, who were at least 50 years of age. We observed online health information searches to specify differences in technical skills and analyzed concurrent verbal protocols to identify health information seekers' cognitive strategies and attitudes. Our main findings relate to (1) attitudes: health information seekers in both cohorts doubted the quality of information retrieved online; among poorly skilled seekers, this was mainly because they doubted their skills to navigate vast amounts of information; once a website was accessed, quality concerns disappeared in both cohorts, (2) technical skills: skilled Web users effectively filtered information according to search intentions and data sources; less-skilled users were easily distracted by unrelated information, and (3) cognitive strategies: skilled Web users searched to inform themselves; less-skilled users searched to confirm their health-related opinions such as "vaccinations are harmful." Independent of Web-use skills, most participants stopped a search once they had found the first piece of evidence satisfying search intentions, rather than according to quality criteria. Findings related to Web-use skills differences suggest two classes of interventions to facilitate access to good-quality online health information. Challenges related to findings (1) and (2) should be remedied by improving people's basic Web-use skills. In particular, Web users should be taught how to avoid information overload by generating specific search terms and to avoid low-quality information by requesting results from trusted websites only. Problems related to finding (3) may be remedied by visually labeling search engine results according to quality criteria.
A Search Engine That's Aware of Your Needs
NASA Technical Reports Server (NTRS)
2005-01-01
Internet research can be compared to trying to drink from a firehose. Such a wealth of information is available that even the simplest inquiry can sometimes generate tens of thousands of leads, more information than most people can handle, and more burdensome than most can endure. Like everyone else, NASA scientists rely on the Internet as a primary search tool. Unlike the average user, though, NASA scientists perform some pretty sophisticated, involved research. To help manage the Internet and to allow researchers at NASA to gain better, more efficient access to the wealth of information, the Agency needed a search tool that was more refined and intelligent than the typical search engine. Partnership NASA funded Stottler Henke, Inc., of San Mateo, California, a cutting-edge software company, with a Small Business Innovation Research (SBIR) contract to develop the Aware software for searching through the vast stores of knowledge quickly and efficiently. The partnership was through NASA s Ames Research Center.
The BioPrompt-box: an ontology-based clustering tool for searching in biological databases.
Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto
2007-03-08
High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank--like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL.
The BioPrompt-box: an ontology-based clustering tool for searching in biological databases
Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto
2007-01-01
Background High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. Results The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank – like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Conclusion Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL. PMID:17430575
Managing Personal and Group Collections of Information
NASA Technical Reports Server (NTRS)
Wolfe, Shawn R.; Wragg, Stephen D.; Chen, James R.; Koga, Dennis (Technical Monitor)
1999-01-01
The internet revolution has dramatically increased the amount of information available to users. Various tools such as search engines have been developed to help users find the information they need from this vast repository. Users often also need tools to help manipulate the growing amount of useful information they have discovered. Current tools available for this purpose are typically local components of web browsers designed to manage URL bookmarks. They provide limited functionalities to handle high information complexities. To tackle this have created DIAMS, an agent-based tool to help users or groups manage their information collections and share their collections with other. the main features of DIAMS are described here.
The LAILAPS search engine: a feature model for relevance ranking in life science databases.
Lange, Matthias; Spies, Karl; Colmsee, Christian; Flemming, Steffen; Klapperstück, Matthias; Scholz, Uwe
2010-03-25
Efficient and effective information retrieval in life sciences is one of the most pressing challenge in bioinformatics. The incredible growth of life science databases to a vast network of interconnected information systems is to the same extent a big challenge and a great chance for life science research. The knowledge found in the Web, in particular in life-science databases, are a valuable major resource. In order to bring it to the scientist desktop, it is essential to have well performing search engines. Thereby, not the response time nor the number of results is important. The most crucial factor for millions of query results is the relevance ranking. In this paper, we present a feature model for relevance ranking in life science databases and its implementation in the LAILAPS search engine. Motivated by the observation of user behavior during their inspection of search engine result, we condensed a set of 9 relevance discriminating features. These features are intuitively used by scientists, who briefly screen database entries for potential relevance. The features are both sufficient to estimate the potential relevance, and efficiently quantifiable. The derivation of a relevance prediction function that computes the relevance from this features constitutes a regression problem. To solve this problem, we used artificial neural networks that have been trained with a reference set of relevant database entries for 19 protein queries. Supporting a flexible text index and a simple data import format, this concepts are implemented in the LAILAPS search engine. It can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. LAILAPS is publicly available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.
Comparing two types of engineering visualizations: task-related manipulations matter.
Cölln, Martin C; Kusch, Kerstin; Helmert, Jens R; Kohler, Petra; Velichkovsky, Boris M; Pannasch, Sebastian
2012-01-01
This study focuses on the comparison of traditional engineering drawings with a CAD (computer aided design) visualization in terms of user performance and eye movements in an applied context. Twenty-five students of mechanical engineering completed search tasks for measures in two distinct depictions of a car engine component (engineering drawing vs. CAD model). Besides spatial dimensionality, the display types most notably differed in terms of information layout, access and interaction options. The CAD visualization yielded better performance, if users directly manipulated the object, but was inferior, if employed in a conventional static manner, i.e. inspecting only predefined views. An additional eye movement analysis revealed longer fixation durations and a stronger increase of task-relevant fixations over time when interacting with the CAD visualization. This suggests a more focused extraction and filtering of information. We conclude that the three-dimensional CAD visualization can be advantageous if its ability to manipulate is used. Copyright © 2011 Elsevier Ltd and The Ergonomics Society. All rights reserved.
SNOMED CT module-driven clinical archetype management.
Allones, J L; Taboada, M; Martinez, D; Lozano, R; Sobrido, M J
2013-06-01
To explore semantic search to improve management and user navigation in clinical archetype repositories. In order to support semantic searches across archetypes, an automated method based on SNOMED CT modularization is implemented to transform clinical archetypes into SNOMED CT extracts. Concurrently, query terms are converted into SNOMED CT concepts using the search engine Lucene. Retrieval is then carried out by matching query concepts with the corresponding SNOMED CT segments. A test collection of the 16 clinical archetypes, including over 250 terms, and a subset of 55 clinical terms from two medical dictionaries, MediLexicon and MedlinePlus, were used to test our method. The keyword-based service supported by the OpenEHR repository offered us a benchmark to evaluate the enhancement of performance. In total, our approach reached 97.4% precision and 69.1% recall, providing a substantial improvement of recall (more than 70%) compared to the benchmark. Exploiting medical domain knowledge from ontologies such as SNOMED CT may overcome some limitations of the keyword-based systems and thus improve the search experience of repository users. An automated approach based on ontology segmentation is an efficient and feasible way for supporting modeling, management and user navigation in clinical archetype repositories. Copyright © 2013 Elsevier Inc. All rights reserved.
Origin of Disagreements in Tandem Mass Spectra Interpretation by Search Engines.
Tessier, Dominique; Lollier, Virginie; Larré, Colette; Rogniaux, Hélène
2016-10-07
Several proteomic database search engines that interpret LC-MS/MS data do not identify the same set of peptides. These disagreements occur even when the scores of the peptide-to-spectrum matches suggest good confidence in the interpretation. Our study shows that these disagreements observed for the interpretations of a given spectrum are almost exclusively due to the variation of what we call the "peptide space", i.e., the set of peptides that are actually compared to the experimental spectra. We discuss the potential difficulties of precisely defining the "peptide space." Indeed, although several parameters that are generally reported in publications can easily be set to the same values, many additional parameters-with much less straightforward user access-might impact the "peptide space" used by each program. Moreover, in a configuration where each search engine identifies the same candidates for each spectrum, the inference of the proteins may remain quite different depending on the false discovery rate selected.
MyLibrary: A Web Personalized Digital Library.
ERIC Educational Resources Information Center
Rocha, Catarina; Xexeo, Geraldo; da Rocha, Ana Regina C.
With the increasing availability of information on Internet information providers, like search engines, digital libraries and online databases, it becomes more important to have personalized systems that help users to find relevant information. One type of personalization that is growing in use is recommender systems. This paper presents…
Six Wishes of a Public Service Librarian.
ERIC Educational Resources Information Center
Fescemyer, Kathy
2001-01-01
Suggests concepts related to information that would be valuable to library users, including the expenses related to information; unique qualities and characteristics of databases; limits of the Web; understanding differences between magazines and scholarly journals; search engine differences; and an appreciation for the amount and variety of…
Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar
2016-07-25
Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.
A Collaboration in Support of LBA Science and Data Exchange: Beija-flor and EOS-WEBSTER
NASA Astrophysics Data System (ADS)
Schloss, A. L.; Gentry, M. J.; Keller, M.; Rhyne, T.; Moore, B.
2001-12-01
The University of New Hampshire (UNH) has developed a Web-based tool that makes data, information, products, and services concerning terrestrial ecological and hydrological processes available to the Earth Science community. Our WEB-based System for Terrestrial Ecosystem Research (EOS-WEBSTER) provides a GIS-oriented interface to select, subset, reformat and download three main types of data: selected NASA Earth Observing System (EOS) remotely sensed data products, results from a suite of ecosystem and hydrological models, and geographic reference data. The Large Scale Biosphere-Atmosphere Experiment in Amazonia Project (LBA) has implemented a search engine, Beija-flor, that provides a centralized access point to data sets acquired for and produced by LBA researchers. The metadata in the Beija-flor index describe the content of the data sets and contain links to data distributed around the world. The query system returns a list of data sets that meet the search criteria of the user. A common problem when a user of a system like Beija-flor wants data products located within another system is that users are required to re-specify information, such as spatial coordinates, in the other system. This poster describes methodology by which Beija-flor generates a unique URL containing the requested search parameters and passes the information to EOS-WEBSTER, thus making the interactive services and large diverse data holdings in EOS-WEBSTER directly available to Beija-flor users. This "Calling Card" is used by EOS-WEBSTER to generate on-demand custom products tailored to each Beija-flor request. Through a collaborative effort, we have demonstrated the ability to integrate project-specific search engines such as Beija-flor with the products and services of large data systems such as EOS-WEBSTER, to provide very specific information products with a minimal amount of additional programming. This methodology has the potential to greatly facilitate research data exchange by enhancing the interoperability of diverse data systems beyond the two described here.
Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen
2014-01-01
Background The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. Objective The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Methods Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic’s consumer health information website. We performed analyses on “Queries with considering repetition counts (QwR)” and “Queries without considering repetition counts (QwoR)”. The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Results Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are “Symptoms” (1 in 3 search queries), “Causes”, and “Treatments & Drugs”. The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. Conclusions This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed. PMID:25000537
Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman
2014-07-04
The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for the search. Health queries tend to be longer and more specific than general search queries. Health queries from SDs are longer and have slightly fewer spelling mistakes than those from PCs. Users specify words related to women and children more often than that of men and any other age group. Most of the health queries are formulated using keywords; the second-most common are wh- and yes/no questions. Users ask more health questions using SDs than PCs. Almost all health queries have at least one noun and health queries from SDs are more descriptive than those from PCs. This study is a large-scale comparative analysis of health search queries to understand the effects of device type (PCs vs. SDs) used on OHISB. The study indicates that the device used for online health information search plays an important role in shaping how health information searches by consumers and patients are executed.
Crawling The Web for Libre: Selecting, Integrating, Extending and Releasing Open Source Software
NASA Astrophysics Data System (ADS)
Truslove, I.; Duerr, R. E.; Wilcox, H.; Savoie, M.; Lopez, L.; Brandt, M.
2012-12-01
Libre is a project developed by the National Snow and Ice Data Center (NSIDC). Libre is devoted to liberating science data from its traditional constraints of publication, location, and findability. Libre embraces and builds on the notion of making knowledge freely available, and both Creative Commons licensed content and Open Source Software are crucial building blocks for, as well as required deliverable outcomes of the project. One important aspect of the Libre project is to discover cryospheric data published on the internet without prior knowledge of the location or even existence of that data. Inspired by well-known search engines and their underlying web crawling technologies, Libre has explored tools and technologies required to build a search engine tailored to allow users to easily discover geospatial data related to the polar regions. After careful consideration, the Libre team decided to base its web crawling work on the Apache Nutch project (http://nutch.apache.org). Nutch is "an open source web-search software project" written in Java, with good documentation, a significant user base, and an active development community. Nutch was installed and configured to search for the types of data of interest, and the team created plugins to customize the default Nutch behavior to better find and categorize these data feeds. This presentation recounts the Libre team's experiences selecting, using, and extending Nutch, and working with the Nutch user and developer community. We will outline the technical and organizational challenges faced in order to release the project's software as Open Source, and detail the steps actually taken. We distill these experiences into a set of heuristics and recommendations for using, contributing to, and releasing Open Source Software.
Study on online community user motif using web usage mining
NASA Astrophysics Data System (ADS)
Alphy, Meera; Sharma, Ajay
2016-04-01
The Web usage mining is the application of data mining, which is used to extract useful information from the online community. The World Wide Web contains at least 4.73 billion pages according to Indexed Web and it contains at least 228.52 million pages according Dutch Indexed web on 6th august 2015, Thursday. It’s difficult to get needed data from these billions of web pages in World Wide Web. Here is the importance of web usage mining. Personalizing the search engine helps the web user to identify the most used data in an easy way. It reduces the time consumption; automatic site search and automatic restore the useful sites. This study represents the old techniques to latest techniques used in pattern discovery and analysis in web usage mining from 1996 to 2015. Analyzing user motif helps in the improvement of business, e-commerce, personalisation and improvement of websites.
A Group Recommender System for Tourist Activities
NASA Astrophysics Data System (ADS)
Garcia, Inma; Sebastia, Laura; Onaindia, Eva; Guzman, Cesar
This paper introduces a method for giving recommendations of tourist activities to a group of users. This method makes recommendations based on the group tastes, their demographic classification and the places visited by the users in former trips. The group recommendation is computed from individual personal recommendations through the use of techniques such as aggregation, intersection or incremental intersection. This method is implemented as an extension of the e-Tourism tool, which is a user-adapted tourism and leisure application, whose main component is the Generalist Recommender System Kernel (GRSK), a domain-independent taxonomy-driven search engine that manages the group recommendation.
Visual Exploratory Search of Relationship Graphs on Smartphones
Ouyang, Jianquan; Zheng, Hao; Kong, Fanbin; Liu, Tianming
2013-01-01
This paper presents a novel framework for Visual Exploratory Search of Relationship Graphs on Smartphones (VESRGS) that is composed of three major components: inference and representation of semantic relationship graphs on the Web via meta-search, visual exploratory search of relationship graphs through both querying and browsing strategies, and human-computer interactions via the multi-touch interface and mobile Internet on smartphones. In comparison with traditional lookup search methodologies, the proposed VESRGS system is characterized with the following perceived advantages. 1) It infers rich semantic relationships between the querying keywords and other related concepts from large-scale meta-search results from Google, Yahoo! and Bing search engines, and represents semantic relationships via graphs; 2) the exploratory search approach empowers users to naturally and effectively explore, adventure and discover knowledge in a rich information world of interlinked relationship graphs in a personalized fashion; 3) it effectively takes the advantages of smartphones’ user-friendly interfaces and ubiquitous Internet connection and portability. Our extensive experimental results have demonstrated that the VESRGS framework can significantly improve the users’ capability of seeking the most relevant relationship information to their own specific needs. We envision that the VESRGS framework can be a starting point for future exploration of novel, effective search strategies in the mobile Internet era. PMID:24223936
Wedge, David C; Krishna, Ritesh; Blackhurst, Paul; Siepen, Jennifer A; Jones, Andrew R; Hubbard, Simon J
2011-04-01
Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the postprocessing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server and a downloadable application that makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline also provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download.
Wedge, David C; Krishna, Ritesh; Blackhurst, Paul; Siepen, Jennifer A; Jones, Andrew R.; Hubbard, Simon J.
2013-01-01
Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the post-processing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server, and a downloadable application, which makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download. PMID:21222473
Engaging Elderly People in Telemedicine Through Gamification
Tabak, Monique; Dekker - van Weering, Marit; Vollenbroek-Hutten, Miriam
2015-01-01
Background Telemedicine can alleviate the increasing demand for elderly care caused by the rapidly aging population. However, user adherence to technology in telemedicine interventions is low and decreases over time. Therefore, there is a need for methods to increase adherence, specifically of the elderly user. A strategy that has recently emerged to address this problem is gamification. It is the application of game elements to nongame fields to motivate and increase user activity and retention. Objective This research aims to (1) provide an overview of existing theoretical frameworks for gamification and explore methods that specifically target the elderly user and (2) explore user classification theories for tailoring game content to the elderly user. This knowledge will provide a foundation for creating a new framework for applying gamification in telemedicine applications to effectively engage the elderly user by increasing and maintaining adherence. Methods We performed a broad Internet search using scientific and nonscientific search engines and included information that described either of the following subjects: the conceptualization of gamification, methods to engage elderly users through gamification, or user classification theories for tailored game content. Results Our search showed two main approaches concerning frameworks for gamification: from business practices, which mostly aim for more revenue, emerge an applied approach, while academia frameworks are developed incorporating theories on motivation while often aiming for lasting engagement. The search provided limited information regarding the application of gamification to engage elderly users, and a significant gap in knowledge on the effectiveness of a gamified application in practice. Several approaches for classifying users in general were found, based on archetypes and reasons to play, and we present them along with their corresponding taxonomies. The overview we created indicates great connectivity between these taxonomies. Conclusions Gamification frameworks have been developed from different backgrounds—business and academia—but rarely target the elderly user. The effectiveness of user classifications for tailored game content in this context is not yet known. As a next step, we propose the development of a framework based on the hypothesized existence of a relation between preference for game content and personality. PMID:26685287
Engaging Elderly People in Telemedicine Through Gamification.
de Vette, Frederiek; Tabak, Monique; Dekker-van Weering, Marit; Vollenbroek-Hutten, Miriam
2015-12-18
Telemedicine can alleviate the increasing demand for elderly care caused by the rapidly aging population. However, user adherence to technology in telemedicine interventions is low and decreases over time. Therefore, there is a need for methods to increase adherence, specifically of the elderly user. A strategy that has recently emerged to address this problem is gamification. It is the application of game elements to nongame fields to motivate and increase user activity and retention. This research aims to (1) provide an overview of existing theoretical frameworks for gamification and explore methods that specifically target the elderly user and (2) explore user classification theories for tailoring game content to the elderly user. This knowledge will provide a foundation for creating a new framework for applying gamification in telemedicine applications to effectively engage the elderly user by increasing and maintaining adherence. We performed a broad Internet search using scientific and nonscientific search engines and included information that described either of the following subjects: the conceptualization of gamification, methods to engage elderly users through gamification, or user classification theories for tailored game content. Our search showed two main approaches concerning frameworks for gamification: from business practices, which mostly aim for more revenue, emerge an applied approach, while academia frameworks are developed incorporating theories on motivation while often aiming for lasting engagement. The search provided limited information regarding the application of gamification to engage elderly users, and a significant gap in knowledge on the effectiveness of a gamified application in practice. Several approaches for classifying users in general were found, based on archetypes and reasons to play, and we present them along with their corresponding taxonomies. The overview we created indicates great connectivity between these taxonomies. Gamification frameworks have been developed from different backgrounds-business and academia-but rarely target the elderly user. The effectiveness of user classifications for tailored game content in this context is not yet known. As a next step, we propose the development of a framework based on the hypothesized existence of a relation between preference for game content and personality.
Weerts to lead Physical Sciences and Engineering directorate | Argonne
Electrochemical Energy Science CTRCenter for Transportation Research CRIChain Reaction Innovations CIComputation Search Energy Environment National Security User Facilities Science Work with Us About Safety News Press Releases Feature Stories Science Highlights In the News Argonne Now Magazine Media Contacts Social Media
Model for Presenting Resources in Scholar's Portal
ERIC Educational Resources Information Center
Feeney, Mary; Newby, Jill
2005-01-01
Presenting electronic resources to users through a federated search engine introduces unique opportunities and challenges to libraries. This article reports on the decision-making tools and processes used for selecting collections of electronic resources by a project team at the University of Arizona (UA) Libraries for the Association of Research…
LBT Distributed Archive: Status and Features
NASA Astrophysics Data System (ADS)
Knapic, C.; Smareglia, R.; Thompson, D.; Grede, G.
2011-07-01
After the first release of the LBT Distributed Archive, this successful collaboration is continuing within the LBT corporation. The IA2 (Italian Center for Astronomical Archive) team had updated the LBT DA with new features in order to facilitate user data retrieval while abiding by VO standards. To facilitate the integration of data from any new instruments, we have migrated to a new database, developed new data distribution software, and enhanced features in the LBT User Interface. The DBMS engine has been changed to MySQL. Consequently, the data handling software now uses java thread technology to update and synchronize the main storage archives on Mt. Graham and in Tucson, as well as archives in Trieste and Heidelberg, with all metadata and proprietary data. The LBT UI has been updated with additional features allowing users to search by instrument and some of the more important characteristics of the images. Finally, instead of a simple cone search service over all LBT image data, new instrument specific SIAP and cone search services have been developed. They will be published in the IVOA framework later this fall.
Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining
Sadesh, S.; Suganthe, R. C.
2015-01-01
Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio. PMID:26221626
[Web-based support system for medical device maintenance].
Zhao, Jinhai; Hou, Wensheng; Chen, Haiyan; Tang, Wei; Wang, Yihui
2015-01-01
A Web-based technology system was put forward aiming at the actual problems of the long maintenance cycle and the difficulties of the maintenance and repairing of medical equipments. Based on analysis of platform system structure and function, using the key technologies such as search engine, BBS, knowledge base and etc, a platform for medical equipment service technician to use by online or offline was designed. The platform provides users with knowledge services and interactive services, enabling users to get a more ideal solution.
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Siążnik, Artur
2013-03-01
Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user's query, advanced data searching based on the specified user's query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/.
The Brazilian Portuguese Lexicon: An Instrument for Psycholinguistic Research
Estivalet, Gustavo L.; Meunier, Fanny
2015-01-01
In this article, we present the Brazilian Portuguese Lexicon, a new word-based corpus for psycholinguistic and computational linguistic research in Brazilian Portuguese. We describe the corpus development, the specific characteristics on the internet site and database for user access. We also perform distributional analyses of the corpus and comparisons to other current databases. Our main objective was to provide a large, reliable, and useful word-based corpus with a dynamic, easy-to-use, and intuitive interface with free internet access for word and word-criteria searches. We used the Núcleo Interinstitucional de Linguística Computacional’s corpus as the basic data source and developed the Brazilian Portuguese Lexicon by deriving and adding metalinguistic and psycholinguistic information about Brazilian Portuguese words. We obtained a final corpus with more than 30 million word tokens, 215 thousand word types and 25 categories of information about each word. This corpus was made available on the internet via a free-access site with two search engines: a simple search and a complex search. The simple engine basically searches for a list of words, while the complex engine accepts all types of criteria in the corpus categories. The output result presents all entries found in the corpus with the criteria specified in the input search and can be downloaded as a.csv file. We created a module in the results that delivers basic statistics about each search. The Brazilian Portuguese Lexicon also provides a pseudoword engine and specific tools for linguistic and statistical analysis. Therefore, the Brazilian Portuguese Lexicon is a convenient instrument for stimulus search, selection, control, and manipulation in psycholinguistic experiments, as also it is a powerful database for computational linguistics research and language modeling related to lexicon distribution, functioning, and behavior. PMID:26630138
The Brazilian Portuguese Lexicon: An Instrument for Psycholinguistic Research.
Estivalet, Gustavo L; Meunier, Fanny
2015-01-01
In this article, we present the Brazilian Portuguese Lexicon, a new word-based corpus for psycholinguistic and computational linguistic research in Brazilian Portuguese. We describe the corpus development, the specific characteristics on the internet site and database for user access. We also perform distributional analyses of the corpus and comparisons to other current databases. Our main objective was to provide a large, reliable, and useful word-based corpus with a dynamic, easy-to-use, and intuitive interface with free internet access for word and word-criteria searches. We used the Núcleo Interinstitucional de Linguística Computacional's corpus as the basic data source and developed the Brazilian Portuguese Lexicon by deriving and adding metalinguistic and psycholinguistic information about Brazilian Portuguese words. We obtained a final corpus with more than 30 million word tokens, 215 thousand word types and 25 categories of information about each word. This corpus was made available on the internet via a free-access site with two search engines: a simple search and a complex search. The simple engine basically searches for a list of words, while the complex engine accepts all types of criteria in the corpus categories. The output result presents all entries found in the corpus with the criteria specified in the input search and can be downloaded as a.csv file. We created a module in the results that delivers basic statistics about each search. The Brazilian Portuguese Lexicon also provides a pseudoword engine and specific tools for linguistic and statistical analysis. Therefore, the Brazilian Portuguese Lexicon is a convenient instrument for stimulus search, selection, control, and manipulation in psycholinguistic experiments, as also it is a powerful database for computational linguistics research and language modeling related to lexicon distribution, functioning, and behavior.
Searching for suicide-related information on Chinese websites.
Chen, Ying-Yeh; Hung, Galen Chin-Lun; Cheng, Qijin; Tsai, Chi-Wei; Wu, Kevin Chien-Chang
2017-12-01
Growing concerns about cyber-suicide have prompted many studies on suicide information available on the web. However, very few studies have considered non-English websites. We aimed to analyze online suicide-related information accessed through Chinese-language websites. We used Taiwan's two most popular search engines (Google and Yahoo) to explore the results returned from six suicide-related search terms in March 2016. The first three pages listing the results from each search were analyzed and rated based on the attitude towards suicide (pro-suicide, anti-suicide, neutral/mixed, not a suicide site, or error). Comparisons across different search terms were also performed. In all, 375 linked webpages were included; 16.3% of the webpages were pro-suicide and 41.3% were anti-suicide. The majority of the pro-suicide sites were user-generated webpages (96.7%). Searches using the keywords 'ways to kill yourself' (31.7%) and 'painless suicide' (28.3%) generated much larger numbers of harmful webpages than the term 'suicide' (4.3%). We conclude that collaborative efforts with internet service providers and search engines to improve the ranking of anti-suicide webpages and websites and implement online suicide reporting guidelines are highly encouraged. Copyright © 2017 Elsevier B.V. All rights reserved.
Multimedia explorer: image database, image proxy-server and search-engine.
Frankewitsch, T.; Prokosch, U.
1999-01-01
Multimedia plays a major role in medicine. Databases containing images, movies or other types of multimedia objects are increasing in number, especially on the WWW. However, no good retrieval mechanism or search engine currently exists to efficiently track down such multimedia sources in the vast of information provided by the WWW. Secondly, the tools for searching databases are usually not adapted to the properties of images. HTML pages do not allow complex searches. Therefore establishing a more comfortable retrieval involves the use of a higher programming level like JAVA. With this platform independent language it is possible to create extensions to commonly used web browsers. These applets offer a graphical user interface for high level navigation. We implemented a database using JAVA objects as the primary storage container which are then stored by a JAVA controlled ORACLE8 database. Navigation depends on a structured vocabulary enhanced by a semantic network. With this approach multimedia objects can be encapsulated within a logical module for quick data retrieval. PMID:10566463
Multimedia explorer: image database, image proxy-server and search-engine.
Frankewitsch, T; Prokosch, U
1999-01-01
Multimedia plays a major role in medicine. Databases containing images, movies or other types of multimedia objects are increasing in number, especially on the WWW. However, no good retrieval mechanism or search engine currently exists to efficiently track down such multimedia sources in the vast of information provided by the WWW. Secondly, the tools for searching databases are usually not adapted to the properties of images. HTML pages do not allow complex searches. Therefore establishing a more comfortable retrieval involves the use of a higher programming level like JAVA. With this platform independent language it is possible to create extensions to commonly used web browsers. These applets offer a graphical user interface for high level navigation. We implemented a database using JAVA objects as the primary storage container which are then stored by a JAVA controlled ORACLE8 database. Navigation depends on a structured vocabulary enhanced by a semantic network. With this approach multimedia objects can be encapsulated within a logical module for quick data retrieval.
The Self-Organized Archive: SPASE, PDS and Archive Cooperatives
NASA Astrophysics Data System (ADS)
King, T. A.; Hughes, J. S.; Roberts, D. A.; Walker, R. J.; Joy, S. P.
2005-05-01
Information systems with high quality metadata enable uses and services which often go beyond the original purpose. There are two types of metadata: annotations which are items that comment on or describe the content of a resource and identification attributes which describe the external properties of the resource itself. For example, annotations may indicate which columns are present in a table of data, whereas an identification attribute would indicate source of the table, such as the observatory, instrument, organization, and data type. When the identification attributes are collected and used as the basis of a search engine, a user can constrain on an attribute, the archive can then self-organize around the constraint, presenting the user with a particular view of the archive. In an archive cooperative where each participating data system or archive may have its own metadata standards, providing a multi-system search engine requires that individual archive metadata be mapped to a broad based standard. To explore how cooperative archives can form a larger self-organized archive we will show how the Space Physics Archive Search and Extract (SPASE) data model will allow different systems to create a cooperative and will use Planetary Data System (PDS) plus existing space physics activities as a demonstration.
Lynx: a database and knowledge extraction engine for integrative medicine.
Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T Conrad; Maltsev, Natalia
2014-01-01
We have developed Lynx (http://lynx.ci.uchicago.edu)--a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces.
NASA Astrophysics Data System (ADS)
Aloisio, Giovanni; Fiore, Sandro; Negro, A.
2010-05-01
The CMCC Data Distribution Centre (DDC) is the primary entry point (web gateway) to the CMCC. It is a Data Grid Portal providing a ubiquitous and pervasive way to ease data publishing, climate metadata search, datasets discovery, metadata annotation, data access, data aggregation, sub-setting, etc. The grid portal security model includes the use of HTTPS protocol for secure communication with the client (based on X509v3 certificates that must be loaded into the browser) and secure cookies to establish and maintain user sessions. The CMCC DDC is now in a pre-production phase and it is currently used only by internal users (CMCC researchers and climate scientists). The most important component already available in the CMCC DDC is the Search Engine which allows users to perform, through web interfaces, distributed search and discovery activities by introducing one or more of the following search criteria: horizontal extent (which can be specified by interacting with a geographic map), vertical extent, temporal extent, keywords, topics, creation date, etc. By means of this page the user submits the first step of the query process on the metadata DB, then, she can choose one or more datasets retrieving and displaying the complete XML metadata description (from the browser). This way, the second step of the query process is carried out by accessing to a specific XML document of the metadata DB. Finally, through the web interface, the user can access to and download (partially or totally) the data stored on the storage device accessing to OPeNDAP servers and to other available grid storage interfaces. Requests concerning datasets stored in deep storage will be served asynchronously.
Science Opportunity Analyzer (SOA) Version 8
NASA Technical Reports Server (NTRS)
Witoff, Robert J.; Polanskey, Carol A.; Aguinaldo, Anna Marie A.; Liu, Ning; Hofstadter, Mark D.
2013-01-01
SOA allows scientists to plan spacecraft observations. It facilitates the identification of geometrically interesting times in a spacecraft s orbit that a user can use to plan observations or instrument-driven spacecraft maneuvers. These observations can then be visualized multiple ways in both two- and three-dimensional views. When observations have been optimized within a spacecraft's flight rules, the resulting plans can be output for use by other JPL uplink tools. Now in its eighth major version, SOA improves on these capabilities in a modern and integrated fashion. SOA consists of five major functions: Opportunity Search, Visualization, Observation Design, Constraint Checking, and Data Output. Opportunity Search is a GUI-driven interface to existing search engines that can be used to identify times when a spacecraft is in a specific geometrical relationship with other bodies in the solar system. This function can be used for advanced mission planning as well as for making last-minute adjustments to mission sequences in response to trajectory modifications. Visualization is a key aspect of SOA. The user can view observation opportunities in either a 3D representation or as a 2D map projection. Observation Design allows the user to orient the spacecraft and visualize the projection of the instrument field of view for that orientation using the same views as Opportunity Search. Constraint Checking is provided to validate various geometrical and physical aspects of an observation design. The user has the ability to easily create custom rules or to use official project-generated flight rules. This capability may also allow scientists to easily assess the cost to science if flight rule changes occur. Data Output allows the user to compute ancillary data related to an observation or to a given position of the spacecraft along its trajectory. The data can be saved as a tab-delimited text file or viewed as a graph. SOA combines science planning functionality unique to both JPL and the sponsoring spacecraft. SOA is able to ingest JPL SPICE Kernels that are used to drive the tool and its computations. A Percy search engine is then included that identifies interesting time periods for the user to build observations. When observations are then built, flight-like orientation algorithms replicate spacecraft dynamics to closely simulate the flight spacecraft s dynamics. SOA v8 represents large steps forward from SOA v7 in terms of quality, reliability, maintainability, efficiency, and user experience. A tailored agile development environment has been built around SOA that provides automated unit testing, continuous build and integration, a consolidated Web-based code and documentation storage environment, modern Java enhancements, and a focus on usability
DTS: Building custom, intelligent schedulers
NASA Technical Reports Server (NTRS)
Hansson, Othar; Mayer, Andrew
1994-01-01
DTS is a decision-theoretic scheduler, built on top of a flexible toolkit -- this paper focuses on how the toolkit might be reused in future NASA mission schedulers. The toolkit includes a user-customizable scheduling interface, and a 'Just-For-You' optimization engine. The customizable interface is built on two metaphors: objects and dynamic graphs. Objects help to structure problem specifications and related data, while dynamic graphs simplify the specification of graphical schedule editors (such as Gantt charts). The interface can be used with any 'back-end' scheduler, through dynamically-loaded code, interprocess communication, or a shared database. The 'Just-For-You' optimization engine includes user-specific utility functions, automatically compiled heuristic evaluations, and a postprocessing facility for enforcing scheduling policies. The optimization engine is based on BPS, the Bayesian Problem-Solver (1,2), which introduced a similar approach to solving single-agent and adversarial graph search problems.
Visibiome: an efficient microbiome search engine based on a scalable, distributed architecture.
Azman, Syafiq Kamarul; Anwar, Muhammad Zohaib; Henschel, Andreas
2017-07-24
Given the current influx of 16S rRNA profiles of microbiota samples, it is conceivable that large amounts of them eventually are available for search, comparison and contextualization with respect to novel samples. This process facilitates the identification of similar compositional features in microbiota elsewhere and therefore can help to understand driving factors for microbial community assembly. We present Visibiome, a microbiome search engine that can perform exhaustive, phylogeny based similarity search and contextualization of user-provided samples against a comprehensive dataset of 16S rRNA profiles environments, while tackling several computational challenges. In order to scale to high demands, we developed a distributed system that combines web framework technology, task queueing and scheduling, cloud computing and a dedicated database server. To further ensure speed and efficiency, we have deployed Nearest Neighbor search algorithms, capable of sublinear searches in high-dimensional metric spaces in combination with an optimized Earth Mover Distance based implementation of weighted UniFrac. The search also incorporates pairwise (adaptive) rarefaction and optionally, 16S rRNA copy number correction. The result of a query microbiome sample is the contextualization against a comprehensive database of microbiome samples from a diverse range of environments, visualized through a rich set of interactive figures and diagrams, including barchart-based compositional comparisons and ranking of the closest matches in the database. Visibiome is a convenient, scalable and efficient framework to search microbiomes against a comprehensive database of environmental samples. The search engine leverages a popular but computationally expensive, phylogeny based distance metric, while providing numerous advantages over the current state of the art tool.
Google search behavior for status epilepticus.
Brigo, Francesco; Trinka, Eugen
2015-08-01
Millions of people surf the Internet every day as a source of health-care information looking for materials about symptoms, diagnosis, treatments and their possible adverse effects, or diagnostic procedures. Google is the most popular search engine and is used by patients and physicians to search for online health-related information. This study aimed to evaluate changes in Google search behavior occurring in English-speaking countries over time for the term "status epilepticus" (SE). Using Google Trends, data on global search queries for the term SE between the 1st of January 2004 and 31st of December 2014 were analyzed. Search volume numbers over time (downloaded as CSV datasets) were analyzed by applying the "health" category filter. The research trends for the term SE remained fairly constant over time. The greatest search volume for the term SE was reported in the United States, followed by India, Australia, the United Kingdom, Canada, the Netherlands, Thailand, and Germany. Most terms associated with the search queries were related to SE definition, symptoms, subtypes, and treatment. The volume of searches for some queries (nonconvulsive, focal, and refractory SE; SE definition; SE guidelines; SE symptoms; SE management; SE treatment) was enormously increased over time (search popularity has exceeded a 5000% growth since 2004). Most people use search engines to look for the term SE to obtain information on its definition, subtypes, and management. The greatest search volume occurred not only in developed countries but also in developing countries where raising awareness about SE still remains a challenging task and where there is reduced public knowledge of epilepsy. Health information seeking (the extent to which people search for health information online) reflects the health-related information needs of Internet users for a specific disease. Google Trends shows that Internet users have a great demand for information concerning some aspects of SE (definition, subtypes, symptoms, treatment, and guidelines). Policy makers and neurological scientific societies have the responsibility to try to meet these information needs and to better target public information campaigns on SE to the general population. This article is part of a Special Issue entitled "Status Epilepticus". Copyright © 2015 Elsevier Inc. All rights reserved.
Wikipedia and Psychology: Coverage of Concepts and Its Use by Undergraduate Students
ERIC Educational Resources Information Center
Schweitzer, N. J.
2008-01-01
The online encyclopedia Wikipedia is a frequently referred-to source of information for Internet users. A series of 3 studies examined Wikipedia's coverage of psychology-related concepts, examined how accessible Wikipedia's psychology content is when using Internet search engines, and described how both first-year and senior undergraduates use…
ERIC Educational Resources Information Center
Chen, Hsinchun; Martinez, Joanne; Kirchhoff, Amy; Ng, Tobun D.; Schatz, Bruce R.
1998-01-01
Grounded on object filtering, automatic indexing, and co-occurrence analysis, an experiment was performed using a parallel supercomputer to analyze over 400,000 abstracts in an INSPEC computer engineering collection. A user evaluation revealed that system-generated thesauri were better than the human-generated INSPEC subject thesaurus in concept…
Mayer, Miguel A; Karampiperis, Pythagoras; Kukurikos, Antonis; Karkaletsis, Vangelis; Stamatakis, Kostas; Villarroel, Dagmar; Leis, Angela
2011-06-01
The number of health-related websites is increasing day-by-day; however, their quality is variable and difficult to assess. Various "trust marks" and filtering portals have been created in order to assist consumers in retrieving quality medical information. Consumers are using search engines as the main tool to get health information; however, the major problem is that the meaning of the web content is not machine-readable in the sense that computers cannot understand words and sentences as humans can. In addition, trust marks are invisible to search engines, thus limiting their usefulness in practice. During the last five years there have been different attempts to use Semantic Web tools to label health-related web resources to help internet users identify trustworthy resources. This paper discusses how Semantic Web technologies can be applied in practice to generate machine-readable labels and display their content, as well as to empower end-users by providing them with the infrastructure for expressing and sharing their opinions on the quality of health-related web resources.
Analysis of Web Spam for Non-English Content: Toward More Effective Language-Based Classifiers
Alsaleh, Mansour; Alarifi, Abdulrahman
2016-01-01
Web spammers aim to obtain higher ranks for their web pages by including spam contents that deceive search engines in order to include their pages in search results even when they are not related to the search terms. Search engines continue to develop new web spam detection mechanisms, but spammers also aim to improve their tools to evade detection. In this study, we first explore the effect of the page language on spam detection features and we demonstrate how the best set of detection features varies according to the page language. We also study the performance of Google Penguin, a newly developed anti-web spamming technique for their search engine. Using spam pages in Arabic as a case study, we show that unlike similar English pages, Google anti-spamming techniques are ineffective against a high proportion of Arabic spam pages. We then explore multiple detection features for spam pages to identify an appropriate set of features that yields a high detection accuracy compared with the integrated Google Penguin technique. In order to build and evaluate our classifier, as well as to help researchers to conduct consistent measurement studies, we collected and manually labeled a corpus of Arabic web pages, including both benign and spam pages. Furthermore, we developed a browser plug-in that utilizes our classifier to warn users about spam pages after clicking on a URL and by filtering out search engine results. Using Google Penguin as a benchmark, we provide an illustrative example to show that language-based web spam classifiers are more effective for capturing spam contents. PMID:27855179
Analysis of Web Spam for Non-English Content: Toward More Effective Language-Based Classifiers.
Alsaleh, Mansour; Alarifi, Abdulrahman
2016-01-01
Web spammers aim to obtain higher ranks for their web pages by including spam contents that deceive search engines in order to include their pages in search results even when they are not related to the search terms. Search engines continue to develop new web spam detection mechanisms, but spammers also aim to improve their tools to evade detection. In this study, we first explore the effect of the page language on spam detection features and we demonstrate how the best set of detection features varies according to the page language. We also study the performance of Google Penguin, a newly developed anti-web spamming technique for their search engine. Using spam pages in Arabic as a case study, we show that unlike similar English pages, Google anti-spamming techniques are ineffective against a high proportion of Arabic spam pages. We then explore multiple detection features for spam pages to identify an appropriate set of features that yields a high detection accuracy compared with the integrated Google Penguin technique. In order to build and evaluate our classifier, as well as to help researchers to conduct consistent measurement studies, we collected and manually labeled a corpus of Arabic web pages, including both benign and spam pages. Furthermore, we developed a browser plug-in that utilizes our classifier to warn users about spam pages after clicking on a URL and by filtering out search engine results. Using Google Penguin as a benchmark, we provide an illustrative example to show that language-based web spam classifiers are more effective for capturing spam contents.
2013-01-01
Background The Internet’s potential impact on suicide is of major public health interest as easy online access to pro-suicide information or specific suicide methods may increase suicide risk among vulnerable Internet users. Little is known, however, about users’ actual searching and browsing behaviors of online suicide-related information. Objective To investigate what webpages people actually clicked on after searching with suicide-related queries on a search engine and to examine what queries people used to get access to pro-suicide websites. Methods A retrospective observational study was done. We used a web search dataset released by America Online (AOL). The dataset was randomly sampled from all AOL subscribers’ web queries between March and May 2006 and generated by 657,000 service subscribers. Results We found 5526 search queries (0.026%, 5526/21,000,000) that included the keyword "suicide". The 5526 search queries included 1586 different search terms and were generated by 1625 unique subscribers (0.25%, 1625/657,000). Of these queries, 61.38% (3392/5526) were followed by users clicking on a search result. Of these 3392 queries, 1344 (39.62%) webpages were clicked on by 930 unique users but only 1314 of those webpages were accessible during the study period. Each clicked-through webpage was classified into 11 categories. The categories of the most visited webpages were: entertainment (30.13%; 396/1314), scientific information (18.31%; 240/1314), and community resources (14.53%; 191/1314). Among the 1314 accessed webpages, we could identify only two pro-suicide websites. We found that the search terms used to access these sites included “commiting suicide with a gas oven”, “hairless goat”, “pictures of murder by strangulation”, and “photo of a severe burn”. A limitation of our study is that the database may be dated and confined to mainly English webpages. Conclusions Searching or browsing suicide-related or pro-suicide webpages was uncommon, although a small group of users did access websites that contain detailed suicide method information. PMID:23305632
Ma, Chun Wai Manson; Lam, Henry
2014-05-02
Discovering novel post-translational modifications (PTMs) to proteins and detecting specific modification sites on proteins is one of the last frontiers of proteomics. At present, hunting for post-translational modifications remains challenging in widely practiced shotgun proteomics workflows due to the typically low abundance of modified peptides and the greatly inflated search space as more potential mass shifts are considered by the search engines. Moreover, most popular search methods require that the user specifies the modification(s) for which to search; therefore, unexpected and novel PTMs will not be detected. Here a new algorithm is proposed to apply spectral library searching to the problem of open modification searches, namely, hunting for PTMs without prior knowledge of what PTMs are in the sample. The proposed tier-wise scoring method intelligently looks for unexpected PTMs by allowing mass-shifted peak matches but only when the number of matches found is deemed statistically significant. This allows the search engine to search for unexpected modifications while maintaining its ability to identify unmodified peptides effectively at the same time. The utility of the method is demonstrated using three different data sets, in which the numbers of spectrum identifications to both unmodified and modified peptides were substantially increased relative to a regular spectral library search as well as to another open modification spectral search method, pMatch.
Mercury: An Example of Effective Software Reuse for Metadata Management, Data Discovery and Access
NASA Astrophysics Data System (ADS)
Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce E.
2008-12-01
Mercury is a federated metadata harvesting, data discovery and access tool based on both open source packages and custom developed software. Though originally developed for NASA, the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports the reuse of metadata by enabling searching across a range of metadata specification and standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115. Mercury provides a single portal to information contained in distributed data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. One of the major goals of the recent redesign of Mercury was to improve the software reusability across the 12 projects which currently fund the continuing development of Mercury. These projects span a range of land, atmosphere, and ocean ecological communities and have a number of common needs for metadata searches, but they also have a number of needs specific to one or a few projects. To balance these common and project-specific needs, Mercury's architecture has three major reusable components; a harvester engine, an indexing system and a user interface component. The harvester engine is responsible for harvesting metadata records from various distributed servers around the USA and around the world. The harvester software was packaged in such a way that all the Mercury projects will use the same harvester scripts but each project will be driven by a set of project specific configuration files. The harvested files are structured metadata records that are indexed against the search library API consistently, so that it can render various search capabilities such as simple, fielded, spatial and temporal. This backend component is supported by a very flexible, easy to use Graphical User Interface which is driven by cascading style sheets, which make it even simpler for reusable design implementation. The new Mercury system is based on a Service Oriented Architecture and effectively reuses components for various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. The software also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets, integrated shopping cart to order datasets from various data centers (ORNL DAAC, NSIDC) and integrated visualization tools. Other features include: Filtering and dynamic sorting of search results, book- markable search results, save, retrieve, and modify search criteria.
Mercury: An Example of Effective Software Reuse for Metadata Management, Data Discovery and Access
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devarakonda, Ranjeet
2008-01-01
Mercury is a federated metadata harvesting, data discovery and access tool based on both open source packages and custom developed software. Though originally developed for NASA, the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports the reuse of metadata by enabling searching across a range of metadata specification and standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115. Mercury provides a single portal to information contained in distributed data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfacesmore » then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. One of the major goals of the recent redesign of Mercury was to improve the software reusability across the 12 projects which currently fund the continuing development of Mercury. These projects span a range of land, atmosphere, and ocean ecological communities and have a number of common needs for metadata searches, but they also have a number of needs specific to one or a few projects. To balance these common and project-specific needs, Mercury's architecture has three major reusable components; a harvester engine, an indexing system and a user interface component. The harvester engine is responsible for harvesting metadata records from various distributed servers around the USA and around the world. The harvester software was packaged in such a way that all the Mercury projects will use the same harvester scripts but each project will be driven by a set of project specific configuration files. The harvested files are structured metadata records that are indexed against the search library API consistently, so that it can render various search capabilities such as simple, fielded, spatial and temporal. This backend component is supported by a very flexible, easy to use Graphical User Interface which is driven by cascading style sheets, which make it even simpler for reusable design implementation. The new Mercury system is based on a Service Oriented Architecture and effectively reuses components for various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. The software also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets, integrated shopping cart to order datasets from various data centers (ORNL DAAC, NSIDC) and integrated visualization tools. Other features include: Filtering and dynamic sorting of search results, book- markable search results, save, retrieve, and modify search criteria.« less
A Taxonomic Search Engine: Federating taxonomic databases using web services
Page, Roderic DM
2005-01-01
Background The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. Results The Taxonomic Search Engine (TSE) is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO) and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID) authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata) for each name. Conclusion The Taxonomic Search Engine is available at and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names. PMID:15757517
NASA Astrophysics Data System (ADS)
Johns, E. M.; Mayernik, M. S.; Boler, F. M.; Corson-Rikert, J.; Daniels, M. D.; Gross, M. B.; Khan, H.; Maull, K. E.; Rowan, L. R.; Stott, D.; Williams, S.; Krafft, D. B.
2015-12-01
Researchers seek information and data through a variety of avenues: published literature, search engines, repositories, colleagues, etc. In order to build a web application that leverages linked open data to enable multiple paths for information discovery, the EarthCollab project has surveyed two geoscience user communities to consider how researchers find and share scholarly output. EarthCollab, a cross-institutional, EarthCube funded project partnering UCAR, Cornell University, and UNAVCO, is employing the open-source semantic web software, VIVO, as the underlying technology to connect the people and resources of virtual research communities. This study will present an analysis of survey responses from members of the two case study communities: (1) the Bering Sea Project, an interdisciplinary field program whose data archive is hosted by NCAR's Earth Observing Laboratory (EOL), and (2) UNAVCO, a geodetic facility and consortium that supports diverse research projects informed by geodesy. The survey results illustrate the types of research products that respondents indicate should be discoverable within a digital platform and the current methods used to find publications, data, personnel, tools, and instrumentation. The responses showed that scientists rely heavily on general purpose search engines, such as Google, to find information, but that data center websites and the published literature were also critical sources for finding collaborators, data, and research tools.The survey participants also identify additional features of interest for an information platform such as search engine indexing, connection to institutional web pages, generation of bibliographies and CVs, and outward linking to social media. Through the survey, the user communities prioritized the type of information that is most important to display and describe their work within a research profile. The analysis of this survey will inform our further development of a platform that will facilitate different types of information discovery strategies, and help researchers to find and use the associated resources of a research project.
Developing a Domain Ontology: the Case of Water Cycle and Hydrology
NASA Astrophysics Data System (ADS)
Gupta, H.; Pozzi, W.; Piasecki, M.; Imam, B.; Houser, P.; Raskin, R.; Ramachandran, R.; Martinez Baquero, G.
2008-12-01
A semantic web ontology enables semantic data integration and semantic smart searching. Several organizations have attempted to implement smart registration and integration or searching using ontologies. These are the NOESIS (NSF project: LEAD) and HydroSeek (NSF project: CUAHS HIS) data discovery engines and the NSF project GEON. All three applications use ontologies to discover data from multiple sources and projects. The NASA WaterNet project was established to identify creative, innovative ways to bridge NASA research results to real world applications, linking decision support needs to available data, observations, and modeling capability. WaterNet (NASA project) utilized the smart query tool Noesis as a testbed to test whether different ontologies (and different catalog searches) could be combined to match resources with user needs. NOESIS contains the upper level SWEET ontology that accepts plug in domain ontologies to refine user search queries, reducing the burden of multiple keyword searches. Another smart search interface was that developed for CUAHSI, HydroSeek, that uses a multi-layered concept search ontology, tagging variables names from any number of data sources to specific leaf and higher level concepts on which the search is executed. This approach has proven to be quite successful in mitigating semantic heterogeneity as the user does not need to know the semantic specifics of each data source system but just uses a set of common keywords to discover the data for a specific temporal and geospatial domain. This presentation will show tests with Noesis and Hydroseek lead to the conclusion that the construction of a complex, and highly heterogeneous water cycle ontology requires multiple ontology modules. To illustrate the complexity and heterogeneity of a water cycle ontology, Hydroseek successfully utilizes WaterOneFlow to integrate data across multiple different data collections, such as USGS NWIS. However,different methodologies are employed by the Earth Science, the Hydrological, and Hydraulic Engineering Communities, and each community employs models that require different input data. If a sub-domain ontology is created for each of these,describing water balance calculations, then the resulting structure of the semantic network describing these various terms can be rather complex, heterogeneous, and overlapping, and will require "mapping" between equivalent terms in the ontologies, along with the development of an upper level conceptual or domain ontology to utilize and link to those already in existence.
The NOAA OneStop System: From Well-Curated Metadata to Data Discovery
NASA Astrophysics Data System (ADS)
McQuinn, E.; Jakositz, A.; Caldwell, A.; Delk, Z.; Neufeld, D.; Shapiro, J.; Partee, R.; Milan, A.
2017-12-01
The NOAA OneStop project is a pathfinder in the realm of enabling users to search for, discover, and access NOAA data. As the project continues along its path to maturity, it has become evident that three areas are of utmost importance to its success in the Earth science community: ensuring quality metadata, building a robust and scalable backend architecture, and keeping the user interface simple to use. Why is this the case? Because, simply put, we are dealing with all aspects of a Big Data problem: large volumes of disparate data needing to be quickly and easily processed and retrieved. In this presentation we discuss the three key aspects of OneStop architecture and how development in each area must be done through cross-team collaboration in order to succeed. We cover aspects of the web-based user interface and OneStop API and how metadata curators and software engineers have worked together to continually iterate on an ever-improving data discovery tool meant to be used by a variety of users searching across a broad assortment of data types.
Search engines, news wires and digital epidemiology: Presumptions and facts.
Kaveh-Yazdy, Fatemeh; Zareh-Bidoki, Ali-Mohammad
2018-07-01
Digital epidemiology tries to identify diseases dynamics and spread behaviors using digital traces collected via search engines logs and social media posts. However, the impacts of news on information-seeking behaviors have been remained unknown. Data employed in this research provided from two sources, (1) Parsijoo search engine query logs of 48 months, and (2) a set of documents of 28 months of Parsijoo's news service. Two classes of topics, i.e. macro-topics and micro-topics were selected to be tracked in query logs and news. Keywords of the macro-topics were automatically generated using web provided resources and exceeded 10k. Keyword set of micro-topics were limited to a numerable list including terms related to diseases and health-related activities. The tests are established in the form of three studies. Study A includes temporal analyses of 7 macro-topics in query logs. Study B considers analyzing seasonality of searching patterns of 9 micro-topics, and Study C assesses the impact of news media coverage on users' health-related information-seeking behaviors. Study A showed that the hourly distribution of various macro-topics followed the changes in social activity level. Conversely, the interestingness of macro-topics did not follow the regulation of topic distributions. Among macro-topics, "Pharmacotherapy" has highest interestingness level and wider time-window of popularity. In Study B, seasonality of a limited number of diseases and health-related activities were analyzed. Trends of infectious diseases, such as flu, mumps and chicken pox were seasonal. Due to seasonality of most of diseases covered in national vaccination plans, the trend belonging to "Immunization and Vaccination" was seasonal, as well. Cancer awareness events caused peaks in search trends of "Cancer" and "Screening" micro-topics in specific days of each year that mimic repeated patterns which may mistakenly be identified as seasonality. In study C, we assessed the co-integration and correlation between news and query trends. Our results demonstrated that micro-topics sparsely covered in news media had lowest level of impressiveness and, subsequently, the lowest impact on users' intents. Our results can reveal public reaction to social events, diseases and prevention procedures. Furthermore, we found that news trends are co-integrated with search queries and are able to reveal health-related events; however, they cannot be used interchangeably. It is recommended that the user-generated contents and news documents are analyzed mutually and interactively. Copyright © 2018 Elsevier B.V. All rights reserved.
2010-01-01
Background Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB) in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. Description RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics) is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA structures is provided. RNA FRABASE 2.0 is freely available at http://rnafrabase.cs.put.poznan.pl. Conclusions RNA FRABASE 2.0 provides a novel database and powerful search engine which is equipped with new data and functionalities that are unavailable elsewhere. Our intention is that this advanced version of the RNA FRABASE will be of interest to all researchers working in the RNA field. PMID:20459631
Popenda, Mariusz; Szachniuk, Marta; Blazewicz, Marek; Wasik, Szymon; Burke, Edmund K; Blazewicz, Jacek; Adamiak, Ryszard W
2010-05-06
Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB) in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics) is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA structures is provided. RNA FRABASE 2.0 is freely available at http://rnafrabase.cs.put.poznan.pl. RNA FRABASE 2.0 provides a novel database and powerful search engine which is equipped with new data and functionalities that are unavailable elsewhere. Our intention is that this advanced version of the RNA FRABASE will be of interest to all researchers working in the RNA field.
Lynx: a database and knowledge extraction engine for integrative medicine
Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T. Conrad; Maltsev, Natalia
2014-01-01
We have developed Lynx (http://lynx.ci.uchicago.edu)—a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces. PMID:24270788
Query Enhancement with Topic Detection and Disambiguation for Robust Retrieval
ERIC Educational Resources Information Center
Zhang, Hui
2013-01-01
With the rapid increase in the amount of available information, people nowadays rely heavily on information retrieval (IR) systems such as web search engine to fulfill their information needs. However, due to the lack of domain knowledge and the limitation of natural language such as synonyms and polysemes, many system users cannot formulate their…
Comparative Analysis of Rank Aggregation Techniques for Metasearch Using Genetic Algorithm
ERIC Educational Resources Information Center
Kaur, Parneet; Singh, Manpreet; Singh Josan, Gurpreet
2017-01-01
Rank Aggregation techniques have found wide applications for metasearch along with other streams such as Sports, Voting System, Stock Markets, and Reduction in Spam. This paper presents the optimization of rank lists for web queries put by the user on different MetaSearch engines. A metaheuristic approach such as Genetic algorithm based rank…
Yom-Tov, Elad; Brunstein-Klomek, Anat; Hadas, Arie; Tamir, Or; Fennig, Silvana
2016-08-01
There is a debate about the effects of pro-anorexia (colloquially referred to as pro-ana) websites. Research suggests that the effect of these websites is not straightforward. Indeed, the actual function of these sites is disputed, with studies indicating both negative and positive effects. This is the first study which systematically examined the differences between pro-anorexia web communities in four main aspects: web language used (posts); web interests/search behaviors (queries); users' self-reported weight status and weight goals; and associated self-reported mood/pathology. We collected three primary sources of data, including messages posed on three pro-ana websites, a survey completed by over 1000 participants of a pro-ana website, and the searches made on the Bing search engine of pro-anorexia users. These data were analyzed for content, reported demographics and pathology, and behavior over time. Although members of the main pro-ana website investigated appear to be depressed, with high rates of self-harm and suicide attempts, users are significantly more interested in treatment, have wishes of procreation and reported the highest goal weights among the investigated sites. In contrast, users of other pro-ana websites investigated, are more interested in morbid themes including depression, self-harm and suicide. The percentage of severely malnourished website users, in general, appears to be small (20%). Our results indicate that a new strategy is required to facilitate the communication between mental health specialists and pro-ana web users, recognizing the differences in harm associated with different websites. Copyright © 2016 Elsevier Ltd. All rights reserved.
Proteomic Cinderella: Customized analysis of bulky MS/MS data in one night.
Kiseleva, Olga; Poverennaya, Ekaterina; Shargunov, Alexander; Lisitsa, Andrey
2018-02-01
Proteomic challenges, stirred up by the advent of high-throughput technologies, produce large amount of MS data. Nowadays, the routine manual search does not satisfy the "speed" of modern science any longer. In our work, the necessity of single-thread analysis of bulky data emerged during interpretation of HepG2 proteome profiling results for proteoforms searching. We compared the contribution of each of the eight search engines (X!Tandem, MS-GF[Formula: see text], MS Amanda, MyriMatch, Comet, Tide, Andromeda, and OMSSA) integrated in an open-source graphical user interface SearchGUI ( http://searchgui.googlecode.com ) into total result of proteoforms identification and optimized set of engines working simultaneously. We also compared the results of our search combination with Mascot results using protein kit UPS2, containing 48 human proteins. We selected combination of X!Tandem, MS-GF[Formula: see text] and OMMSA as the most time-efficient and productive combination of search. We added homemade java-script to automatize pipeline from file picking to report generation. These settings resulted in rise of the efficiency of our customized pipeline unobtainable by manual scouting: the analysis of 192 files searched against human proteome (42153 entries) downloaded from UniProt took 11[Formula: see text]h.
National Utilization and Forecasting of Ototopical Antibiotics: Medicaid Data Versus "Dr. Google".
Crowson, Matthew G; Schulz, Kristine; Tucci, Debara L
2016-09-01
To forecast national Medicaid prescription volumes for common ototopical antibiotics, and correlate prescription volumes with internet user search interest using Google Trends (GT). National United States Medicaid prescription and GT user search database analysis. Quarterly national Medicaid summary drug utilization data and weekly GT search engine data for ciprofloxacin-dexamethasone (CD), ofloxacin (OF), and Cortisporin (CS) ototopicals were obtained from January 2008 to July 2014. Time series analysis was used to assess prescription seasonality, Holt-Winter's method for forecasting quarterly prescription volumes, and Pearson correlations to compare GT and Medicaid data. Medicaid prescription volumes demonstrated sinusoidal seasonality for OF (r = 0.91), CS (r = 0.71), and CD (r = 0.62) with annual peaks in July, August, and September. In 2017, OF was forecasted to be the most widely prescribed ototopical, followed by CD. CS was the least prescribed, and volumes were forecasted to decrease 9.0% by 2017 from 2014. GT user search interest demonstrated analogous sinusoidal seasonality and significant correlations with Medicaid data prescriptions for CD (r = 0.38, p = 0.046), OF (r = 0.74, p < 0.001), CS (r = 0.49, p = 0.008). We found that OF, CD, and CS ototopicals have sinusoidal seasonal variation with Medicaid prescription volume peaks occurring in the summer. After 2012, OF was the most commonly prescribed ototopical, and this trend was forecasted to continue. CS use was forecasted to decrease. Google user search interest in these ototopical agents demonstrated analogous seasonal variation. Analyses of GT for interest in ototopical antibiotics may be useful for health care providers and administrators as a complementary method for assessing healthcare utilization trends.
A Firefly Algorithm-based Approach for Pseudo-Relevance Feedback: Application to Medical Database.
Khennak, Ilyes; Drias, Habiba
2016-11-01
The difficulty of disambiguating the sense of the incomplete and imprecise keywords that are extensively used in the search queries has caused the failure of search systems to retrieve the desired information. One of the most powerful and promising method to overcome this shortcoming and improve the performance of search engines is Query Expansion, whereby the user's original query is augmented by new keywords that best characterize the user's information needs and produce more useful query. In this paper, a new Firefly Algorithm-based approach is proposed to enhance the retrieval effectiveness of query expansion while maintaining low computational complexity. In contrast to the existing literature, the proposed approach uses a Firefly Algorithm to find the best expanded query among a set of expanded query candidates. Moreover, this new approach allows the determination of the length of the expanded query empirically. Experimental results on MEDLINE, the on-line medical information database, show that our proposed approach is more effective and efficient compared to the state-of-the-art.
Descriptive Question Answering with Answer Type Independent Features
NASA Astrophysics Data System (ADS)
Yoon, Yeo-Chan; Lee, Chang-Ki; Kim, Hyun-Ki; Jang, Myung-Gil; Ryu, Pum Mo; Park, So-Young
In this paper, we present a supervised learning method to seek out answers to the most frequently asked descriptive questions: reason, method, and definition questions. Most of the previous systems for question answering focus on factoids, lists or definitional questions. However, descriptive questions such as reason questions and method questions are also frequently asked by users. We propose a system for these types of questions. The system conducts an answer search as follows. First, we analyze the user's question and extract search keywords and the expected answer type. Second, information retrieval results are obtained from an existing search engine such as Yahoo or Google. Finally, we rank the results to find snippets containing answers to the questions based on a ranking SVM algorithm. We also propose features to identify snippets containing answers for descriptive questions. The features are adaptable and thus are not dependent on answer type. Experimental results show that the proposed method and features are clearly effective for the task.
Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.
Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc
2016-01-01
Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.
Plain Language to Communicate Physical Activity Information: A Website Content Analysis.
Paige, Samantha R; Black, David R; Mattson, Marifran; Coster, Daniel C; Stellefson, Michael
2018-04-01
Plain language techniques are health literacy universal precautions intended to enhance health care system navigation and health outcomes. Physical activity (PA) is a popular topic on the Internet, yet it is unknown if information is communicated in plain language. This study examined how plain language techniques are included in PA websites, and if the use of plain language techniques varies according to search procedures (keyword, search engine) and website host source (government, commercial, educational/organizational). Three keywords ("physical activity," "fitness," and "exercise") were independently entered into three search engines (Google, Bing, and Yahoo) to locate a nonprobability sample of websites ( N = 61). Fourteen plain language techniques were coded within each website to examine content formatting, clarity and conciseness, and multimedia use. Approximately half ( M = 6.59; SD = 1.68) of the plain language techniques were included in each website. Keyword physical activity resulted in websites with fewer clear and concise plain language techniques ( p < .05), whereas fitness resulted in websites with more clear and concise techniques ( p < .01). Plain language techniques did not vary by search engine or the website host source. Accessing PA information that is easy to understand and behaviorally oriented may remain a challenge for users. Transdisciplinary collaborations are needed to optimize plain language techniques while communicating online PA information.
OS2: Oblivious similarity based searching for encrypted data outsourced to an untrusted domain
Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Ramzan, Naeem
2017-01-01
Public cloud storage services are becoming prevalent and myriad data sharing, archiving and collaborative services have emerged which harness the pay-as-you-go business model of public cloud. To ensure privacy and confidentiality often encrypted data is outsourced to such services, which further complicates the process of accessing relevant data by using search queries. Search over encrypted data schemes solve this problem by exploiting cryptographic primitives and secure indexing to identify outsourced data that satisfy the search criteria. Almost all of these schemes rely on exact matching between the encrypted data and search criteria. A few schemes which extend the notion of exact matching to similarity based search, lack realism as those schemes rely on trusted third parties or due to increase storage and computational complexity. In this paper we propose Oblivious Similarity based Search (OS2) for encrypted data. It enables authorized users to model their own encrypted search queries which are resilient to typographical errors. Unlike conventional methodologies, OS2 ranks the search results by using similarity measure offering a better search experience than exact matching. It utilizes encrypted bloom filter and probabilistic homomorphic encryption to enable authorized users to access relevant data without revealing results of search query evaluation process to the untrusted cloud service provider. Encrypted bloom filter based search enables OS2 to reduce search space to potentially relevant encrypted data avoiding unnecessary computation on public cloud. The efficacy of OS2 is evaluated on Google App Engine for various bloom filter lengths on different cloud configurations. PMID:28692697
Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Ramzan, Naeem; Khan, Wajahat Ali
2017-01-01
Public cloud storage services are becoming prevalent and myriad data sharing, archiving and collaborative services have emerged which harness the pay-as-you-go business model of public cloud. To ensure privacy and confidentiality often encrypted data is outsourced to such services, which further complicates the process of accessing relevant data by using search queries. Search over encrypted data schemes solve this problem by exploiting cryptographic primitives and secure indexing to identify outsourced data that satisfy the search criteria. Almost all of these schemes rely on exact matching between the encrypted data and search criteria. A few schemes which extend the notion of exact matching to similarity based search, lack realism as those schemes rely on trusted third parties or due to increase storage and computational complexity. In this paper we propose Oblivious Similarity based Search ([Formula: see text]) for encrypted data. It enables authorized users to model their own encrypted search queries which are resilient to typographical errors. Unlike conventional methodologies, [Formula: see text] ranks the search results by using similarity measure offering a better search experience than exact matching. It utilizes encrypted bloom filter and probabilistic homomorphic encryption to enable authorized users to access relevant data without revealing results of search query evaluation process to the untrusted cloud service provider. Encrypted bloom filter based search enables [Formula: see text] to reduce search space to potentially relevant encrypted data avoiding unnecessary computation on public cloud. The efficacy of [Formula: see text] is evaluated on Google App Engine for various bloom filter lengths on different cloud configurations.
Galbusera, Fabio; Brayda-Bruno, Marco; Freutel, Maren; Seitz, Andreas; Steiner, Malte; Wehrle, Esther; Wilke, Hans-Joachim
2012-01-01
Previous surveys showed a poor quality of the web sites providing health information about low back pain. However, the rapid and continuous evolution of the Internet content may question the current validity of those investigations. The present study is aimed to quantitatively assess the quality of the Internet information about low back pain retrieved with the most commonly employed search engines. An Internet search with the keywords "low back pain" has been performed with Google, Yahoo!® and Bing™ in the English language. The top 30 hits obtained with each search engine were evaluated by five independent raters and averaged following criteria derived from previous works. All search results were categorized as declaring compliant to a quality standard for health information (e.g. HONCode) or not and based on the web site type (Institutional, Free informative, Commercial, News, Social Network, Unknown). The quality of the hits retrieved by the three search engines was extremely similar. The web sites had a clear purpose, were easy to navigate, and mostly lacked in validity and quality of the provided links. The conformity to a quality standard was correlated with a marked greater quality of the web sites in all respects. Institutional web sites had the best validity and ease of use. Free informative web sites had good quality but a markedly lower validity compared to Institutional websites. Commercial web sites provided more biased information. News web sites were well designed and easy to use, but lacked in validity. The average quality of the hits retrieved by the most commonly employed search engines could be defined as satisfactory and favorably comparable with previous investigations. Awareness of the user about checking the quality of the information remains of concern.
Decision making in family medicine
Labrecque, Michel; Ratté, Stéphane; Frémont, Pierre; Cauchon, Michel; Ouellet, Jérôme; Hogg, William; McGowan, Jessie; Gagnon, Marie-Pierre; Njoya, Merlin; Légaré, France
2013-01-01
Abstract Objective To compare the ability of users of 2 medical search engines, InfoClinique and the Trip database, to provide correct answers to clinical questions and to explore the perceived effects of the tools on the clinical decision-making process. Design Randomized trial. Setting Three family medicine units of the family medicine program of the Faculty of Medicine at Laval University in Quebec city, Que. Participants Fifteen second-year family medicine residents. Intervention Residents generated 30 structured questions about therapy or preventive treatment (2 questions per resident) based on clinical encounters. Using an Internet platform designed for the trial, each resident answered 20 of these questions (their own 2, plus 18 of the questions formulated by other residents, selected randomly) before and after searching for information with 1 of the 2 search engines. For each question, 5 residents were randomly assigned to begin their search with InfoClinique and 5 with the Trip database. Main outcome measures The ability of residents to provide correct answers to clinical questions using the search engines, as determined by third-party evaluation. After answering each question, participants completed a questionnaire to assess their perception of the engine’s effect on the decision-making process in clinical practice. Results Of 300 possible pairs of answers (1 answer before and 1 after the initial search), 254 (85%) were produced by 14 residents. Of these, 132 (52%) and 122 (48%) pairs of answers concerned questions that had been assigned an initial search with InfoClinique and the Trip database, respectively. Both engines produced an important and similar absolute increase in the proportion of correct answers after searching (26% to 62% for InfoClinique, for an increase of 36%; 24% to 63% for the Trip database, for an increase of 39%; P = .68). For all 30 clinical questions, at least 1 resident produced the correct answer after searching with either search engine. The mean (SD) time of the initial search for each question was 23.5 (7.6) minutes with InfoClinique and 22.3 (7.8) minutes with the Trip database (P = .30). Participants’ perceptions of each engine’s effect on the decision-making process were very positive and similar for both search engines. Conclusion Family medicine residents’ ability to provide correct answers to clinical questions increased dramatically and similarly with the use of both InfoClinique and the Trip database. These tools have strong potential to increase the quality of medical care. PMID:24130286
Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries
Lev-Ran, Shaul
2017-01-01
Background Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Objective Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. Methods We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration’s Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Results Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). Conclusions These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. PMID:29074469
Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis.
Agarwal, Vibhu; Zhang, Liangliang; Zhu, Josh; Fang, Shiyuan; Cheng, Tim; Hong, Chloe; Shah, Nigam H
2016-09-21
By recent estimates, the steady rise in health care costs has deprived more than 45 million Americans of health care services and has encouraged health care providers to better understand the key drivers of health care utilization from a population health management perspective. Prior studies suggest the feasibility of mining population-level patterns of health care resource utilization from observational analysis of Internet search logs; however, the utility of the endeavor to the various stakeholders in a health ecosystem remains unclear. The aim was to carry out a closed-loop evaluation of the utility of health care use predictions using the conversion rates of advertisements that were displayed to the predicted future utilizers as a surrogate. The statistical models to predict the probability of user's future visit to a medical facility were built using effective predictors of health care resource utilization, extracted from a deidentified dataset of geotagged mobile Internet search logs representing searches made by users of the Baidu search engine between March 2015 and May 2015. We inferred presence within the geofence of a medical facility from location and duration information from users' search logs and putatively assigned medical facility visit labels to qualifying search logs. We constructed a matrix of general, semantic, and location-based features from search logs of users that had 42 or more search days preceding a medical facility visit as well as from search logs of users that had no medical visits and trained statistical learners for predicting future medical visits. We then carried out a closed-loop evaluation of the utility of health care use predictions using the show conversion rates of advertisements displayed to the predicted future utilizers. In the context of behaviorally targeted advertising, wherein health care providers are interested in minimizing their cost per conversion, the association between show conversion rate and predicted utilization score, served as a surrogate measure of the model's utility. We obtained the highest area under the curve (0.796) in medical visit prediction with our random forests model and daywise features. Ablating feature categories one at a time showed that the model performance worsened the most when location features were dropped. An online evaluation in which advertisements were served to users who had a high predicted probability of a future medical visit showed a 3.96% increase in the show conversion rate. Results from our experiments done in a research setting suggest that it is possible to accurately predict future patient visits from geotagged mobile search logs. Results from the offline and online experiments on the utility of health utilization predictions suggest that such prediction can have utility for health care providers.
Creating and Searching a Local Inventory for Data Granules in a Remote Archive
NASA Astrophysics Data System (ADS)
Cornillon, P. C.
2016-12-01
More often than not, search capabilities for network accessible data do not exist or do not meet the requirements of the user. For large archives this can make finding data of interest tedious at best. This summer, the author encountered such a problem with regard to the two existing archives of VIIRS L2 sea surface temperature (SST) fields obtained with the new ACSPO retrieval algorithm; one at the Jet Propulsion Laboratory's PO-DAAC and the other at NOAA's National Centers for Environmental Information (NCEI). In both cases the data were available via ftp and OPeNDAP but there was no search capability at the PO-DAAC and the NCEI archive was incomplete. Furthermore, in order to meet the needs of a broad range of datasets and users, the beta version of the search engine at NCEI was cumbersome for the searches of interest. Although some of these problems have been resolved since (and may be described in other posters/presentations at this meeting), the solution described in this presentation offers the user the ability to develop a search capability for archives lacking a search capability and/or to configure searches more to his or her preferences than the generic searches offered by the data provider. The solution, a Matlab script, used html access to the PO-DAAC web site to locate all VIIRS 10 minute granules and OPeNDAP access to acquire the bounding box for each granule from the metadata bound to the file. This task required several hours of wall time to acquire the data and to write the bounding boxes to a local file with the associated ftp and OPeNDAP urls for the 110,000+ granule archive. A second Matlab script searched the local archive, seconds, for granules falling in a user defined space-time window and an ascii file of wget commands associated with these was generated. This file was then executed to acquire the data of interest. The wget commands can be configured to acquire the entire files via ftp or a subset of each file via OPeNDAP. Furthermore, the search capability, based on bounding boxes and rectangular regions, could easily be modified to further refine the search. Finally, the script that builds the inventory has been designed to update the local inventory, minutes per month rather than hours.
Google and Women's Health-Related Issues: What Does the Search Engine Data Reveal?
Baazeem, Mazin; Abenhaim, Haim
2014-01-01
Identifying the gaps in public knowledge of women's health related issues has always been difficult. With the increasing number of Internet users in the United States, we sought to use the Internet as a tool to help us identify such gaps and to estimate women's most prevalent health concerns by examining commonly searched health-related keywords in Google search engine. We collected a large pool of possible search keywords from two independent practicing obstetrician/gynecologists and classified them into five main categories (obstetrics, gynecology, infertility, urogynecology/menopause and oncology), and measured the monthly average search volume within the United States for each keyword with all its possible combinations using Google AdWords tool. We found that pregnancy related keywords were less frequently searched in general compared to other categories with an average of 145,400 hits per month for the top twenty keywords. Among the most common pregnancy-related keywords was "pregnancy and sex' while pregnancy-related diseases were uncommonly searched. HPV alone was searched 305,400 times per month. Of the cancers affecting women, breast cancer was the most commonly searched with an average of 247,190 times per month, followed by cervical cancer then ovarian cancer. The commonly searched keywords are often issues that are not discussed in our daily practice as well as in public health messages. The search volume is relatively related to disease prevalence with the exception of ovarian cancer which could signify a public fear.
Toward visual user interfaces supporting collaborative multimedia content management
NASA Astrophysics Data System (ADS)
Husein, Fathi; Leissler, Martin; Hemmje, Matthias
2000-12-01
Supporting collaborative multimedia content management activities, as e.g., image and video acquisition, exploration, and access dialogues between naive users and multi media information systems is a non-trivial task. Although a wide variety of experimental and prototypical multimedia storage technologies as well as corresponding indexing and retrieval engines are available, most of them lack appropriate support for collaborative end-user oriented user interface front ends. The development of advanced user adaptable interfaces is necessary for building collaborative multimedia information- space presentations based upon advanced tools for information browsing, searching, filtering, and brokering to be applied on potentially very large and highly dynamic multimedia collections with a large number of users and user groups. Therefore, the development of advanced and at the same time adaptable and collaborative computer graphical information presentation schemes that allow to easily apply adequate visual metaphors for defined target user stereotypes has to become a key focus within ongoing research activities trying to support collaborative information work with multimedia collections.
NASA Astrophysics Data System (ADS)
Douma, M.; Ligierko, G.; Angelov, I.
2008-10-01
The need for information has increased exponentially over the past decades. The current systems for constructing, exploring, classifying, organizing, and searching information face the growing challenge of enabling their users to operate efficiently and intuitively in knowledge-heavy environments. This paper presents SpicyNodes, an advanced user interface for difficult interaction contexts. It is based on an underlying structure known as a radial map, which allows users to manipulate and interact in a natural manner with entities called nodes. This technology overcomes certain limitations of existing solutions and solves the problem of browsing complex sets of linked information. SpicyNodes is also an organic system that projects users into a living space, stimulating exploratory behavior and fostering creative thought. Our interactive radial layout is used for educational purposes and has the potential for numerous other applications.
Classification of Automated Search Traffic
NASA Astrophysics Data System (ADS)
Buehrer, Greg; Stokes, Jack W.; Chellapilla, Kumar; Platt, John C.
As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.
ERIC Educational Resources Information Center
Lyall-Wilson, Jennifer Rae
2013-01-01
The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of…
2014-01-30
tradeoffs systems can achieve between effectiveness (overall gains across queries) and robustness (minimizing the probability of significant failure...by multiple users; less than 10 terms in length; and relatively low effective - ness scores across multiple commercial search engines (as of January...appear in the judgment (qrels) file. These relevance grades are also used for calculating graded effectiveness measures, except that a value of -2 is
Collaborating with Users to Design Learning Spaces: Playing Nicely in the Sandbox
ERIC Educational Resources Information Center
Weaver, Barbara
2009-01-01
What should a campus do when it needs more learning spaces but can't construct new buildings? Dr. Benjamin Sill's first task when he became the director of Clemson University's general engineering program was to find space for classrooms and for the advising program. His search ended in the old YMCA building (Holtzendorff Hall), where space was…
Technology mediator: a new role for the reference librarian?
Howse, David K; Bracke, Paul J; Keim, Samuel M
2006-01-01
The Arizona Health Sciences Library has collaborated with clinical faculty to develop a federated search engine that is useful for meeting real-time clinical information needs. This article proposes a technology mediation role for the reference librarian that was inspired by the project, and describes the collaborative model used for developing technology-mediated services for targeted users. PMID:17040566
Federal Register 2010, 2011, 2012, 2013, 2014
2011-09-02
... Counties, North Dakota; A 4,500 horsepower compressor station containing two natural gas-driven engines... included in the User's Guide under the ``e-filing'' link on the Commission's Web site. Please note that the... on ``General Search'' and enter the docket number, excluding the last three digits in the Docket...
Federal Register 2010, 2011, 2012, 2013, 2014
2012-02-09
...,000 horsepower and one 4,735 horsepower gas engine driven compressors; Four primary and three... proceeding by filing a request to intervene. Instructions for becoming an intervenor are in the User's Guide... on ``General Search'' and enter the docket number, excluding the last three digits in the Docket...
The Fragment Network: A Chemistry Recommendation Engine Built Using a Graph Database.
Hall, Richard J; Murray, Christopher W; Verdonk, Marcel L
2017-07-27
The hit validation stage of a fragment-based drug discovery campaign involves probing the SAR around one or more fragment hits. This often requires a search for similar compounds in a corporate collection or from commercial suppliers. The Fragment Network is a graph database that allows a user to efficiently search chemical space around a compound of interest. The result set is chemically intuitive, naturally grouped by substitution pattern and meaningfully sorted according to the number of observations of each transformation in medicinal chemistry databases. This paper describes the algorithms used to construct and search the Fragment Network and provides examples of how it may be used in a drug discovery context.
EXADS - EXPERT SYSTEM FOR AUTOMATED DESIGN SYNTHESIS
NASA Technical Reports Server (NTRS)
Rogers, J. L.
1994-01-01
The expert system called EXADS was developed to aid users of the Automated Design Synthesis (ADS) general purpose optimization program. Because of the general purpose nature of ADS, it is difficult for a nonexpert to select the best choice of strategy, optimizer, and one-dimensional search options from the one hundred or so combinations that are available. EXADS aids engineers in determining the best combination based on their knowledge of the problem and the expert knowledge previously stored by experts who developed ADS. EXADS is a customized application of the AESOP artificial intelligence program (the general version of AESOP is available separately from COSMIC. The ADS program is also available from COSMIC.) The expert system consists of two main components. The knowledge base contains about 200 rules and is divided into three categories: constrained, unconstrained, and constrained treated as unconstrained. The EXADS inference engine is rule-based and makes decisions about a particular situation using hypotheses (potential solutions), rules, and answers to questions drawn from the rule base. EXADS is backward-chaining, that is, it works from hypothesis to facts. The rule base was compiled from sources such as literature searches, ADS documentation, and engineer surveys. EXADS will accept answers such as yes, no, maybe, likely, and don't know, or a certainty factor ranging from 0 to 10. When any hypothesis reaches a confidence level of 90% or more, it is deemed as the best choice and displayed to the user. If no hypothesis is confirmed, the user can examine explanations of why the hypotheses failed to reach the 90% level. The IBM PC version of EXADS is written in IQ-LISP for execution under DOS 2.0 or higher with a central memory requirement of approximately 512K of 8 bit bytes. This program was developed in 1986.
Trajectory Browser: An Online Tool for Interplanetary Trajectory Analysis and Visualization
NASA Technical Reports Server (NTRS)
Foster, Cyrus James
2013-01-01
The trajectory browser is a web-based tool developed at the NASA Ames Research Center for finding preliminary trajectories to planetary bodies and for providing relevant launch date, time-of-flight and (Delta)V requirements. The site hosts a database of transfer trajectories from Earth to planets and small-bodies for various types of missions such as rendezvous, sample return or flybys. A search engine allows the user to find trajectories meeting desired constraints on the launch window, mission duration and (Delta)V capability, while a trajectory viewer tool allows the visualization of the heliocentric trajectory and the detailed mission itinerary. The anticipated user base of this tool consists primarily of scientists and engineers designing interplanetary missions in the context of pre-phase A studies, particularly for performing accessibility surveys to large populations of small-bodies.
GeoSearch: A lightweight broking middleware for geospatial resources discovery
NASA Astrophysics Data System (ADS)
Gui, Z.; Yang, C.; Liu, K.; Xia, J.
2012-12-01
With petabytes of geodata, thousands of geospatial web services available over the Internet, it is critical to support geoscience research and applications by finding the best-fit geospatial resources from the massive and heterogeneous resources. Past decades' developments witnessed the operation of many service components to facilitate geospatial resource management and discovery. However, efficient and accurate geospatial resource discovery is still a big challenge due to the following reasons: 1)The entry barriers (also called "learning curves") hinder the usability of discovery services to end users. Different portals and catalogues always adopt various access protocols, metadata formats and GUI styles to organize, present and publish metadata. It is hard for end users to learn all these technical details and differences. 2)The cost for federating heterogeneous services is high. To provide sufficient resources and facilitate data discovery, many registries adopt periodic harvesting mechanism to retrieve metadata from other federated catalogues. These time-consuming processes lead to network and storage burdens, data redundancy, and also the overhead of maintaining data consistency. 3)The heterogeneous semantics issues in data discovery. Since the keyword matching is still the primary search method in many operational discovery services, the search accuracy (precision and recall) is hard to guarantee. Semantic technologies (such as semantic reasoning and similarity evaluation) offer a solution to solve these issues. However, integrating semantic technologies with existing service is challenging due to the expandability limitations on the service frameworks and metadata templates. 4)The capabilities to help users make final selection are inadequate. Most of the existing search portals lack intuitive and diverse information visualization methods and functions (sort, filter) to present, explore and analyze search results. Furthermore, the presentation of the value-added additional information (such as, service quality and user feedback), which conveys important decision supporting information, is missing. To address these issues, we prototyped a distributed search engine, GeoSearch, based on brokering middleware framework to search, integrate and visualize heterogeneous geospatial resources. Specifically, 1) A lightweight discover broker is developed to conduct distributed search. The broker retrieves metadata records for geospatial resources and additional information from dispersed services (portals and catalogues) and other systems on the fly. 2) A quality monitoring and evaluation broker (i.e., QoS Checker) is developed and integrated to provide quality information for geospatial web services. 3) The semantic assisted search and relevance evaluation functions are implemented by loosely interoperating with ESIP Testbed component. 4) Sophisticated information and data visualization functionalities and tools are assembled to improve user experience and assist resource selection.
Precipitation links (PrecipLinks) - a prototype directory for precipitation information
NASA Technical Reports Server (NTRS)
Velanthapillia, Balendran; Stocker, Erich Franz
2006-01-01
This poster describes a web directory of research oriented precipitation links. In this era of sophisticated search engines and web agents, it might seem counterproductive to establish such a directory of links. However, entering precipitation into a search engine like google will yield over one million hits. To further exacerbate this situation many of the returned links are dead, duplicates of other links, incomplete, or only marginally related to research precipitation or even the broader precipitation area. Sometimes connecting the linked URL causes the browser to lose context and not be able to get back to the original page. Even using more sophisticated search engines query parameters or agents while reducing the overall return doesn't eliminate all of the other issues listed. As part of the development of the measurement-based Precipitation Processing System (PPS) that will support Tropical Rainfall Measuring Mission (TRMM) version 7 reprocessing and the Global Precipitation Measurement (GPM) mission a precipitation links (PrecipLinks) facility is being developed. PrecipLinks is intended to share locations of other sites that contain information or data pertaining to precipitation research. Potential contributors can log-on to the PrecipLinks website and register their site for inclusion in the directory. The price for inclusion is the requirement to place a link back to PrecipLinks on the webpage that is registered. This ensures that users will be able to easily get back to PrecipLinks regardless of any context issues that browsers might have. Perhaps more importantly users while visiting one site that they know can be referred to a location that has many others sites with which they might not be familiar. PrecipLinks is designed to have a very flat structure. This poster summarizes these categories (information, data, services) and the reasons for their selection. Providers may register multiple pages to which they wish to direct users. However, each page may be attached to only one of these categories. Each page to which they refer users will also have a return link to PrecipLinks. The poster describes the operation of the system both the automated and the human processes. It also provides images for the various steps in the registration and use.
Social and Psychological Effects of the Internet Use
Diomidous, Marianna; Chardalias, Kostis; Magita, Adrianna; Koutonias, Panagiotis; Panagiotopoulou, Paraskevi; Mantas, John
2016-01-01
Background and Aims: Over the past two decades there was an upsurge of the use of Internet in human life. With this continuous development, Internet users are able to communicate with any part of the globe, to shop online, to use it as a mean of education, to work remotely and to conduct financial transactions. Unfortunately, this rapid development of the Internet has a detrimental impact in our life, which leads to various phenomena such as cyber bullying, cyber porn, cyber suicide, Internet addiction, social isolation, cyber racism etc. The main purpose of this paper is to record and analyze all these social and psychological effects that appears to users due to the extensive use of the Internet. Materials and Methods: This review study was a thorough search of bibliography data conducted through Internet and library research studies. Key words were extracted from search engines and data bases including Google, Yahoo, Scholar Google, PubMed. Findings: The findings of this study showed that the Internet offers a quick access to information and facilitates communication however; it is quite dangerous, especially for young users. For this reason, users should be aware of it and face critically any information that is handed from the website PMID:27041814
Advanced SPARQL querying in small molecule databases.
Galgonek, Jakub; Hurt, Tomáš; Michlíková, Vendula; Onderka, Petr; Schwarz, Jan; Vondrášek, Jiří
2016-01-01
In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF.
Social and Psychological Effects of the Internet Use.
Diomidous, Marianna; Chardalias, Kostis; Magita, Adrianna; Koutonias, Panagiotis; Panagiotopoulou, Paraskevi; Mantas, John
2016-02-01
Over the past two decades there was an upsurge of the use of Internet in human life. With this continuous development, Internet users are able to communicate with any part of the globe, to shop online, to use it as a mean of education, to work remotely and to conduct financial transactions. Unfortunately, this rapid development of the Internet has a detrimental impact in our life, which leads to various phenomena such as cyber bullying, cyber porn, cyber suicide, Internet addiction, social isolation, cyber racism etc. The main purpose of this paper is to record and analyze all these social and psychological effects that appears to users due to the extensive use of the Internet. This review study was a thorough search of bibliography data conducted through Internet and library research studies. Key words were extracted from search engines and data bases including Google, Yahoo, Scholar Google, PubMed. The findings of this study showed that the Internet offers a quick access to information and facilitates communication however; it is quite dangerous, especially for young users. For this reason, users should be aware of it and face critically any information that is handed from the website.
NASA Technical Reports Server (NTRS)
Barbieri, Enrique
2005-01-01
The Test and Engineering Directorate at NASA John C. Stennis Space Center developed an interest to study the modeling, evaluation, and control of a liquid hydrogen (LH2) and gas hydrogen (GH2) mixer subsystem of a ground test facility. This facility carries out comprehensive ground-based testing and certification of liquid rocket engines including the Space Shuttle Main engine. A software simulation environment developed in MATLAB/SIMULINK (M/S) will allow NASA engineers to test rocket engine systems at relatively no cost. In the progress report submitted in February 2004, we described the development of two foundation programs, a reverse look-up application using various interpolation algorithms, a variety of search and return methods, and self-checking methods to reduce the error in returned search results to increase the functionality of the program. The results showed that these efforts were successful. To transfer this technology to engineers who are not familiar with the M/S environment, a four-module GUI was implemented allowing the user to evaluate the mixer model under open-loop and closed-loop conditions. The progress report was based on an udergraduate Honors Thesis by Ms. Jamie Granger Austin in the Department of Electrical Engineering and Computer Science at Tulane University, during January-May 2003, and her continued efforts during August-December 2003. In collaboration with Dr. Hanz Richter and Dr. Fernando Figueroa we published these results in a NASA Tech Brief due to appear this year. Although the original proposal in 2003 did not address other components of the test facility, we decided in the last few months to extend our research and consider a related pressurization tank component as well. This report summarizes the results obtained towards a Graphical User Interface (GUI) for the evaluation and control of the hydrogen mixer subsystem model and for the pressurization tank each taken individually. Further research would combine the two components - mixer and tank, for a more realistic simulation tool.
Cancer Internet search activity on a major search engine, United States 2001-2003.
Cooper, Crystale Purvis; Mallon, Kenneth P; Leadbetter, Steven; Pollack, Lori A; Peipins, Lucy A
2005-07-01
To locate online health information, Internet users typically use a search engine, such as Yahoo! or Google. We studied Yahoo! search activity related to the 23 most common cancers in the United States. The objective was to test three potential correlates of Yahoo! cancer search activity--estimated cancer incidence, estimated cancer mortality, and the volume of cancer news coverage--and to study the periodicity of and peaks in Yahoo! cancer search activity. Yahoo! cancer search activity was obtained from a proprietary database called the Yahoo! Buzz Index. The American Cancer Society's estimates of cancer incidence and mortality were used. News reports associated with specific cancer types were identified using the LexisNexis "US News" database, which includes more than 400 national and regional newspapers and a variety of newswire services. The Yahoo! search activity associated with specific cancers correlated with their estimated incidence (Spearman rank correlation, rho = 0.50, P = .015), estimated mortality (rho = 0.66, P = .001), and volume of related news coverage (rho = 0.88, P < .001). Yahoo! cancer search activity tended to be higher on weekdays and during national cancer awareness months but lower during summer months; cancer news coverage also tended to follow these trends. Sharp increases in Yahoo! search activity scores from one day to the next appeared to be associated with increases in relevant news coverage. Media coverage appears to play a powerful role in prompting online searches for cancer information. Internet search activity offers an innovative tool for passive surveillance of health information-seeking behavior.
Cancer Internet Search Activity on a Major Search Engine, United States 2001-2003
Cooper, Crystale Purvis; Mallon, Kenneth P; Leadbetter, Steven; Peipins, Lucy A
2005-01-01
Background To locate online health information, Internet users typically use a search engine, such as Yahoo! or Google. We studied Yahoo! search activity related to the 23 most common cancers in the United States. Objective The objective was to test three potential correlates of Yahoo! cancer search activity—estimated cancer incidence, estimated cancer mortality, and the volume of cancer news coverage—and to study the periodicity of and peaks in Yahoo! cancer search activity. Methods Yahoo! cancer search activity was obtained from a proprietary database called the Yahoo! Buzz Index. The American Cancer Society's estimates of cancer incidence and mortality were used. News reports associated with specific cancer types were identified using the LexisNexis “US News” database, which includes more than 400 national and regional newspapers and a variety of newswire services. Results The Yahoo! search activity associated with specific cancers correlated with their estimated incidence (Spearman rank correlation, ρ = 0.50, P = .015), estimated mortality (ρ = 0.66, P = .001), and volume of related news coverage (ρ = 0.88, P < .001). Yahoo! cancer search activity tended to be higher on weekdays and during national cancer awareness months but lower during summer months; cancer news coverage also tended to follow these trends. Sharp increases in Yahoo! search activity scores from one day to the next appeared to be associated with increases in relevant news coverage. Conclusions Media coverage appears to play a powerful role in prompting online searches for cancer information. Internet search activity offers an innovative tool for passive surveillance of health information–seeking behavior. PMID:15998627
DockoMatic 2.0: high throughput inverse virtual screening and homology modeling.
Bullock, Casey; Cornia, Nic; Jacob, Reed; Remm, Andrew; Peavey, Thomas; Weekes, Ken; Mallory, Chris; Oxford, Julia T; McDougal, Owen M; Andersen, Timothy L
2013-08-26
DockoMatic is a free and open source application that unifies a suite of software programs within a user-friendly graphical user interface (GUI) to facilitate molecular docking experiments. Here we describe the release of DockoMatic 2.0; significant software advances include the ability to (1) conduct high throughput inverse virtual screening (IVS); (2) construct 3D homology models; and (3) customize the user interface. Users can now efficiently setup, start, and manage IVS experiments through the DockoMatic GUI by specifying receptor(s), ligand(s), grid parameter file(s), and docking engine (either AutoDock or AutoDock Vina). DockoMatic automatically generates the needed experiment input files and output directories and allows the user to manage and monitor job progress. Upon job completion, a summary of results is generated by Dockomatic to facilitate interpretation by the user. DockoMatic functionality has also been expanded to facilitate the construction of 3D protein homology models using the Timely Integrated Modeler (TIM) wizard. The wizard TIM provides an interface that accesses the basic local alignment search tool (BLAST) and MODELER programs and guides the user through the necessary steps to easily and efficiently create 3D homology models for biomacromolecular structures. The DockoMatic GUI can be customized by the user, and the software design makes it relatively easy to integrate additional docking engines, scoring functions, or third party programs. DockoMatic is a free comprehensive molecular docking software program for all levels of scientists in both research and education.
PIRIA: a general tool for indexing, search, and retrieval of multimedia content
NASA Astrophysics Data System (ADS)
Joint, Magali; Moellic, Pierre-Alain; Hede, P.; Adam, P.
2004-05-01
The Internet is a continuously expanding source of multimedia content and information. There are many products in development to search, retrieve, and understand multimedia content. But most of the current image search/retrieval engines, rely on a image database manually pre-indexed with keywords. Computers are still powerless to understand the semantic meaning of still or animated image content. Piria (Program for the Indexing and Research of Images by Affinity), the search engine we have developed brings this possibility closer to reality. Piria is a novel search engine that uses the query by example method. A user query is submitted to the system, which then returns a list of images ranked by similarity, obtained by a metric distance that operates on every indexed image signature. These indexed images are compared according to several different classifiers, not only Keywords, but also Form, Color and Texture, taking into account geometric transformations and variance like rotation, symmetry, mirroring, etc. Form - Edges extracted by an efficient segmentation algorithm. Color - Histogram, semantic color segmentation and spatial color relationship. Texture - Texture wavelets and local edge patterns. If required, Piria is also able to fuse results from multiple classifiers with a new classification of index categories: Single Indexer Single Call (SISC), Single Indexer Multiple Call (SIMC), Multiple Indexers Single Call (MISC) or Multiple Indexers Multiple Call (MIMC). Commercial and industrial applications will be explored and discussed as well as current and future development.
Adverse Reactions Associated With Cannabis Consumption as Evident From Search Engine Queries.
Yom-Tov, Elad; Lev-Ran, Shaul
2017-10-26
Cannabis is one of the most widely used psychoactive substances worldwide, but adverse drug reactions (ADRs) associated with its use are difficult to study because of its prohibited status in many countries. Internet search engine queries have been used to investigate ADRs in pharmaceutical drugs. In this proof-of-concept study, we tested whether these queries can be used to detect the adverse reactions of cannabis use. We analyzed anonymized queries from US-based users of Bing, a widely used search engine, made over a period of 6 months and compared the results with the prevalence of cannabis use as reported in the US National Survey on Drug Use in the Household (NSDUH) and with ADRs reported in the Food and Drug Administration's Adverse Drug Reporting System. Predicted prevalence of cannabis use was estimated from the fraction of people making queries about cannabis, marijuana, and 121 additional synonyms. Predicted ADRs were estimated from queries containing layperson descriptions to 195 ICD-10 symptoms list. Our results indicated that the predicted prevalence of cannabis use at the US census regional level reaches an R 2 of .71 NSDUH data. Queries for ADRs made by people who also searched for cannabis reveal many of the known adverse effects of cannabis (eg, cough and psychotic symptoms), as well as plausible unknown reactions (eg, pyrexia). These results indicate that search engine queries can serve as an important tool for the study of adverse reactions of illicit drugs, which are difficult to study in other settings. ©Elad Yom-Tov, Shaul Lev-Ran. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 26.10.2017.
NASA Access Mechanism - Graphical user interface information retrieval system
NASA Technical Reports Server (NTRS)
Hunter, Judy F.; Generous, Curtis; Duncan, Denise
1993-01-01
Access to online information sources of aerospace, scientific, and engineering data, a mission focus for NASA's Scientific and Technical Information Program, has always been limited by factors such as telecommunications, query language syntax, lack of standardization in the information, and the lack of adequate tools to assist in searching. Today, the NASA STI Program's NASA Access Mechanism (NAM) prototype offers a solution to these problems by providing the user with a set of tools that provide a graphical interface to remote, heterogeneous, and distributed information in a manner adaptable to both casual and expert users. Additionally, the NAM provides access to many Internet-based services such as Electronic Mail, the Wide Area Information Servers system, Peer Locating tools, and electronic bulletin boards.
NASA access mechanism: Graphical user interface information retrieval system
NASA Technical Reports Server (NTRS)
Hunter, Judy; Generous, Curtis; Duncan, Denise
1993-01-01
Access to online information sources of aerospace, scientific, and engineering data, a mission focus for NASA's Scientific and Technical Information Program, has always been limited to factors such as telecommunications, query language syntax, lack of standardization in the information, and the lack of adequate tools to assist in searching. Today, the NASA STI Program's NASA Access Mechanism (NAM) prototype offers a solution to these problems by providing the user with a set of tools that provide a graphical interface to remote, heterogeneous, and distributed information in a manner adaptable to both casual and expert users. Additionally, the NAM provides access to many Internet-based services such as Electronic Mail, the Wide Area Information Servers system, Peer Locating tools, and electronic bulletin boards.
Flexible patient information search and retrieval framework: pilot implementation
NASA Astrophysics Data System (ADS)
Erdal, Selnur; Catalyurek, Umit V.; Saltz, Joel; Kamal, Jyoti; Gurcan, Metin N.
2007-03-01
Medical centers collect and store significant amount of valuable data pertaining to patients' visit in the form of medical free-text. In addition, standardized diagnosis codes (International Classification of Diseases, Ninth Revision, Clinical Modification: ICD9-CM) related to those dictated reports are usually available. In this work, we have created a framework where image searches could be initiated through a combination of free-text reports as well as ICD9 codes. This framework enables more comprehensive search on existing large sets of patient data in a systematic way. The free text search is enriched by computer-aided inclusion of additional search terms enhanced by a thesaurus. This combination of enriched search allows users to access to a larger set of relevant results from a patient-centric PACS in a simpler way. Therefore, such framework is of particular use in tasks such as gathering images for desired patient populations, building disease models, and so on. As the motivating application of our framework, we implemented a search engine. This search engine processed two years of patient data from the OSU Medical Center's Information Warehouse and identified lung nodule location information using a combination of UMLS Meta-Thesaurus enhanced text report searches along with ICD9 code searches on patients that have been discharged. Five different queries with various ICD9 codes involving lung cancer were carried out on 172552 cases. Each search was completed under a minute on average per ICD9 code and the inclusion of UMLS thesaurus increased the number of relevant cases by 45% on average.
Google and Women’s Health-Related Issues: What Does the Search Engine Data Reveal?
Baazeem, Mazin
2014-01-01
Objectives Identifying the gaps in public knowledge of women’s health related issues has always been difficult. With the increasing number of Internet users in the United States, we sought to use the Internet as a tool to help us identify such gaps and to estimate women’s most prevalent health concerns by examining commonly searched health-related keywords in Google search engine. Methods We collected a large pool of possible search keywords from two independent practicing obstetrician/gynecologists and classified them into five main categories (obstetrics, gynecology, infertility, urogynecology/menopause and oncology), and measured the monthly average search volume within the United States for each keyword with all its possible combinations using Google AdWords tool. Results We found that pregnancy related keywords were less frequently searched in general compared to other categories with an average of 145,400 hits per month for the top twenty keywords. Among the most common pregnancy-related keywords was “pregnancy and sex’ while pregnancy-related diseases were uncommonly searched. HPV alone was searched 305,400 times per month. Of the cancers affecting women, breast cancer was the most commonly searched with an average of 247,190 times per month, followed by cervical cancer then ovarian cancer. Conclusion The commonly searched keywords are often issues that are not discussed in our daily practice as well as in public health messages. The search volume is relatively related to disease prevalence with the exception of ovarian cancer which could signify a public fear. PMID:25422723
A web search on environmental topics: what is the role of ranking?
Covolo, Loredana; Filisetti, Barbara; Mascaretti, Silvia; Limina, Rosa Maria; Gelatti, Umberto
2013-12-01
Although the Internet is easy to use, the mechanisms and logic behind a Web search are often unknown. Reliable information can be obtained, but it may not be visible as the Web site is not located in the first positions of search results. The possible risks of adverse health effects arising from environmental hazards are issues of increasing public interest, and therefore the information about these risks, particularly on topics for which there is no scientific evidence, is very crucial. The aim of this study was to investigate whether the presentation of information on some environmental health topics differed among various search engines, assuming that the most reliable information should come from institutional Web sites. Five search engines were used: Google, Yahoo!, Bing, Ask, and AOL. The following topics were searched in combination with the word "health": "nuclear energy," "electromagnetic waves," "air pollution," "waste," and "radon." For each topic three key words were used. The first 30 search results for each query were considered. The ranking variability among the search engines and the type of search results were analyzed for each topic and for each key word. The ranking of institutional Web sites was given particular consideration. Variable results were obtained when surfing the Internet on different environmental health topics. Multivariate logistic regression analysis showed that, when searching for radon and air pollution topics, it is more likely to find institutional Web sites in the first 10 positions compared with nuclear power (odds ratio=3.4, 95% confidence interval 2.1-5.4 and odds ratio=2.9, 95% confidence interval 1.8-4.7, respectively) and also when using Google compared with Bing (odds ratio=3.1, 95% confidence interval 1.9-5.1). The increasing use of online information could play an important role in forming opinions. Web users should become more aware of the importance of finding reliable information, and health institutions should be able to make that information more visible.
Quantifying the semantics of search behavior before stock market moves.
Curme, Chester; Preis, Tobias; Stanley, H Eugene; Moat, Helen Susannah
2014-08-12
Technology is becoming deeply interwoven into the fabric of society. The Internet has become a central source of information for many people when making day-to-day decisions. Here, we present a method to mine the vast data Internet users create when searching for information online, to identify topics of interest before stock market moves. In an analysis of historic data from 2004 until 2012, we draw on records from the search engine Google and online encyclopedia Wikipedia as well as judgments from the service Amazon Mechanical Turk. We find evidence of links between Internet searches relating to politics or business and subsequent stock market moves. In particular, we find that an increase in search volume for these topics tends to precede stock market falls. We suggest that extensions of these analyses could offer insight into large-scale information flow before a range of real-world events.
Quantifying the semantics of search behavior before stock market moves
Curme, Chester; Preis, Tobias; Stanley, H. Eugene; Moat, Helen Susannah
2014-01-01
Technology is becoming deeply interwoven into the fabric of society. The Internet has become a central source of information for many people when making day-to-day decisions. Here, we present a method to mine the vast data Internet users create when searching for information online, to identify topics of interest before stock market moves. In an analysis of historic data from 2004 until 2012, we draw on records from the search engine Google and online encyclopedia Wikipedia as well as judgments from the service Amazon Mechanical Turk. We find evidence of links between Internet searches relating to politics or business and subsequent stock market moves. In particular, we find that an increase in search volume for these topics tends to precede stock market falls. We suggest that extensions of these analyses could offer insight into large-scale information flow before a range of real-world events. PMID:25071193
Raising the IQ in full-text searching via intelligent querying
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kero, R.; Russell, L.; Swietlik, C.
1994-11-01
Current Information Retrieval (IR) technologies allow for efficient access to relevant information, provided that user selected query terms coincide with the specific linguistical choices made by the authors whose works constitute the text-base. Therefore, the challenge is to enhance the limited searching capability of state-of-the-practice IR. This can be done either with augmented clients that overcome current server searching deficiencies, or with added capabilities that can augment searching algorithms on the servers. The technology being investigated is that of deductive databases, with a set of new techniques called cooperative answering. This technology utilizes semantic networks to allow for navigation betweenmore » possible query search term alternatives. The augmented search terms are passed to an IR engine and the results can be compared. The project utilizes the OSTI Environment, Safety and Health Thesaurus to populate the domain specific semantic network and the text base of ES&H related documents from the Facility Profile Information Management System as the domain specific search space.« less
Pipelining Architecture of Indexing Using Agglomerative Clustering
NASA Astrophysics Data System (ADS)
Goyal, Deepika; Goyal, Deepti; Gupta, Parul
2010-11-01
The World Wide Web is an interlinked collection of billions of documents. Ironically the huge size of this collection has become an obstacle for information retrieval. To access the information from Internet, search engine is used. Search engine retrieve the pages from indexer. This paper introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time and also clustering algorithm that aims at partitioning the set of documents into ordered clusters so that the documents within the same cluster are similar and are being assigned the closer document identifiers. After assigning to the clusters it creates the hierarchy of index so that searching is efficient. It will make the super cluster then mega cluster by itself. The pipeline architecture will create the index in such a way that it will be efficient in space and time saving manner. It will direct the search from higher level to lower level of index or higher level of clusters to lower level of cluster so that the user gets the possible match result in time saving manner. As one cluster is making by taking only two clusters so it search is limited to two clusters for lower level of index and so on. So it is efficient in time saving manner.
How Users Search the Library from a Single Search Box
ERIC Educational Resources Information Center
Lown, Cory; Sierra, Tito; Boyer, Josh
2013-01-01
Academic libraries are turning increasingly to unified search solutions to simplify search and discovery of library resources. Unfortunately, very little research has been published on library user search behavior in single search box environments. This study examines how users search a large public university library using a prominent, single…
ETDEWEB versus the World-Wide-Web: a specific database/web comparison
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cutler, Debbie
2010-06-28
A study was performed comparing user search results from the specialized scientific database on energy-related information, ETDEWEB, with search results from the internet search engines Google and Google Scholar. The primary objective of the study was to determine if ETDEWEB (the Energy Technology Data Exchange – World Energy Base) continues to bring the user search results that are not being found by Google and Google Scholar. As a multilateral information exchange initiative, ETDE’s member countries and partners contribute cost- and task-sharing resources to build the largest database of energy-related information in the world. As of early 2010, the ETDEWEB databasemore » has 4.3 million citations to world-wide energy literature. One of ETDEWEB’s strengths is its focused scientific content and direct access to full text for its grey literature (over 300,000 documents in PDF available for viewing from the ETDE site and over a million additional links to where the documents can be found at research organizations and major publishers globally). Google and Google Scholar are well-known for the wide breadth of the information they search, with Google bringing in news, factual and opinion-related information, and Google Scholar also emphasizing scientific content across many disciplines. The analysis compared the results of 15 energy-related queries performed on all three systems using identical words/phrases. A variety of subjects was chosen, although the topics were mostly in renewable energy areas due to broad international interest. Over 40,000 search result records from the three sources were evaluated. The study concluded that ETDEWEB is a significant resource to energy experts for discovering relevant energy information. For the 15 topics in this study, ETDEWEB was shown to bring the user unique results not shown by Google or Google Scholar 86.7% of the time. Much was learned from the study beyond just metric comparisons. Observations about the strengths of each system and factors impacting the search results are also shared along with background information and summary tables of the results. If a user knows a very specific title of a document, all three systems are helpful in finding the user a source for the document. But if the user is looking to discover relevant documents on a specific topic, each of the three systems will bring back a considerable volume of data, but quite different in focus. Google is certainly a highly-used and valuable tool to find significant ‘non-specialist’ information, and Google Scholar does help the user focus on scientific disciplines. But if a user’s interest is scientific and energy-specific, ETDEWEB continues to hold a strong position in the energy research, technology and development (RTD) information field and adds considerable value in knowledge discovery. (auth)« less
Investigating User Search Tactic Patterns and System Support in Using Digital Libraries
ERIC Educational Resources Information Center
Joo, Soohyung
2013-01-01
This study aims to investigate users' search tactic application and system support in using digital libraries. A user study was conducted with sixty digital library users. The study was designed to answer three research questions: 1) How do users engage in a search process by applying different types of search tactics while conducting different…
Library Instruction and Online Database Searching.
ERIC Educational Resources Information Center
Mercado, Heidi
1999-01-01
Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…
A novel spectral library workflow to enhance protein identifications.
Li, Haomin; Zong, Nobel C; Liang, Xiangbo; Kim, Allen K; Choi, Jeong Ho; Deng, Ning; Zelaya, Ivette; Lam, Maggie; Duan, Huilong; Ping, Peipei
2013-04-09
The innovations in mass spectrometry-based investigations in proteome biology enable systematic characterization of molecular details in pathophysiological phenotypes. However, the process of delineating large-scale raw proteomic datasets into a biological context requires high-throughput data acquisition and processing. A spectral library search engine makes use of previously annotated experimental spectra as references for subsequent spectral analyses. This workflow delivers many advantages, including elevated analytical efficiency and specificity as well as reduced demands in computational capacity. In this study, we created a spectral matching engine to address challenges commonly associated with a library search workflow. Particularly, an improved sliding dot product algorithm, that is robust to systematic drifts of mass measurement in spectra, is introduced. Furthermore, a noise management protocol distinguishes spectra correlation attributed from noise and peptide fragments. It enables elevated separation between target spectral matches and false matches, thereby suppressing the possibility of propagating inaccurate peptide annotations from library spectra to query spectra. Moreover, preservation of original spectra also accommodates user contributions to further enhance the quality of the library. Collectively, this search engine supports reproducible data analyses using curated references, thereby broadening the accessibility of proteomics resources to biomedical investigators. This article is part of a Special Issue entitled: From protein structures to clinical applications. Copyright © 2013 Elsevier B.V. All rights reserved.
DockoMatic 2.0: High Throughput Inverse Virtual Screening and Homology Modeling
Bullock, Casey; Cornia, Nic; Jacob, Reed; Remm, Andrew; Peavey, Thomas; Weekes, Ken; Mallory, Chris; Oxford, Julia T.; McDougal, Owen M.; Andersen, Timothy L.
2013-01-01
DockoMatic is a free and open source application that unifies a suite of software programs within a user-friendly Graphical User Interface (GUI) to facilitate molecular docking experiments. Here we describe the release of DockoMatic 2.0; significant software advances include the ability to: (1) conduct high throughput Inverse Virtual Screening (IVS); (2) construct 3D homology models; and (3) customize the user interface. Users can now efficiently setup, start, and manage IVS experiments through the DockoMatic GUI by specifying a receptor(s), ligand(s), grid parameter file(s), and docking engine (either AutoDock or AutoDock Vina). DockoMatic automatically generates the needed experiment input files and output directories, and allows the user to manage and monitor job progress. Upon job completion, a summary of results is generated by Dockomatic to facilitate interpretation by the user. DockoMatic functionality has also been expanded to facilitate the construction of 3D protein homology models using the Timely Integrated Modeler (TIM) wizard. The wizard TIM provides an interface that accesses the basic local alignment search tool (BLAST) and MODELLER programs, and guides the user through the necessary steps to easily and efficiently create 3D homology models for biomacromolecular structures. The DockoMatic GUI can be customized by the user, and the software design makes it relatively easy to integrate additional docking engines, scoring functions, or third party programs. DockoMatic is a free comprehensive molecular docking software program for all levels of scientists in both research and education. PMID:23808933
Dunne, Suzanne; Cummins, Niamh Maria; Hannigan, Ailish; Shannon, Bill; Dunne, Colum; Cullen, Walter
2013-08-27
The Internet is a widely used source of information for patients searching for medical/health care information. While many studies have assessed existing medical/health care information on the Internet, relatively few have examined methods for design and delivery of such websites, particularly those aimed at the general public. This study describes a method of evaluating material for new medical/health care websites, or for assessing those already in existence, which is correlated with higher rankings on Google's Search Engine Results Pages (SERPs). A website quality assessment (WQA) tool was developed using criteria related to the quality of the information to be contained in the website in addition to an assessment of the readability of the text. This was retrospectively applied to assess existing websites that provide information about generic medicines. The reproducibility of the WQA tool and its predictive validity were assessed in this study. The WQA tool demonstrated very high reproducibility (intraclass correlation coefficient=0.95) between 2 independent users. A moderate to strong correlation was found between WQA scores and rankings on Google SERPs. Analogous correlations were seen between rankings and readability of websites as determined by Flesch Reading Ease and Flesch-Kincaid Grade Level scores. The use of the WQA tool developed in this study is recommended as part of the design phase of a medical or health care information provision website, along with assessment of readability of the material to be used. This may ensure that the website performs better on Google searches. The tool can also be used retrospectively to make improvements to existing websites, thus, potentially enabling better Google search result positions without incurring the costs associated with Search Engine Optimization (SEO) professionals or paid promotion.
[Pruritus in Germany-a Google search engine analysis].
Zink, A; Rüth, M; Schuster, B; Darsow, U; Biedermann, T; Ständer, S
2018-06-06
Because affected persons often do not visit a doctor, the prevalence of chronic and acute pruritus in the general population is difficult to determine. The aim of this study is to estimate the frequency and the most common locations of pruritus in German internet users, who-with 62.4 million persons-represent a large majority of the German population, by analysing the Google search volume. Relevant keywords for the subject "pruritus" were identified and analysed using the Google AdWords Keyword Planner. The assessment period was January 2015 to December 2016. In total the Google AdWords Keyword Planner identified 701 keywords for the topic "Juckreiz" (German lay word for pruritus), resulting in 7,531,890 pruritus-related Google searches during the assessment period. Most common search terms were the German lay term for atopic eczema ("Neurodermitis", 23.7%), the German lay term for psoriasis ("Schuppenflechte", 17.8%) and "psoriasis" (13%). The German lay term for pruritus ("Juckreiz") was only the sixth most searched term (3%). Most searches (72%) focused on influencing factors for pruritus, especially on skin diseases and skin conditions. The most commonly searched location was pruritus on the whole body, followed by anal pruritus. Analysis of the temporal course showed a higher monthly search volume during winter. With its unconventional methodology, a Google search engine analysis, this study allows a rough estimation of the medical need of pruritus in the German general population, which seems to be higher than expected. Especially pruritus in the anal area was identified as an unmet medical need.
Yom-Tov, Elad; Fernandez-Luque, Luis
2014-01-01
Vaccination campaigns are one of the most important and successful public health programs ever undertaken. People who want to learn about vaccines in order to make an informed decision on whether to vaccinate are faced with a wealth of information on the Internet, both for and against vaccinations. In this paper we develop an automated way to score Internet search queries and web pages as to the likelihood that a person making these queries or reading those pages would decide to vaccinate. We apply this method to data from a major Internet search engine, while people seek information about the Measles, Mumps and Rubella (MMR) vaccine. We show that our method is accurate, and use it to learn about the information acquisition process of people. Our results show that people who are pro-vaccination as well as people who are anti-vaccination seek similar information, but browsing this information has differing effect on their future browsing. These findings demonstrate the need for health authorities to tailor their information according to the current stance of users.
Semantics-Based Intelligent Indexing and Retrieval of Digital Images - A Case Study
NASA Astrophysics Data System (ADS)
Osman, Taha; Thakker, Dhavalkumar; Schaefer, Gerald
The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they typically rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this chapter we present a semantically enabled image annotation and retrieval engine that is designed to satisfy the requirements of commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as presenting our initial thoughts on exploiting lexical databases for explicit semantic-based query expansion.
Yom-Tov, Elad; Fernandez-Luque, Luis
2014-01-01
Vaccination campaigns are one of the most important and successful public health programs ever undertaken. People who want to learn about vaccines in order to make an informed decision on whether to vaccinate are faced with a wealth of information on the Internet, both for and against vaccinations. In this paper we develop an automated way to score Internet search queries and web pages as to the likelihood that a person making these queries or reading those pages would decide to vaccinate. We apply this method to data from a major Internet search engine, while people seek information about the Measles, Mumps and Rubella (MMR) vaccine. We show that our method is accurate, and use it to learn about the information acquisition process of people. Our results show that people who are pro-vaccination as well as people who are anti-vaccination seek similar information, but browsing this information has differing effect on their future browsing. These findings demonstrate the need for health authorities to tailor their information according to the current stance of users. PMID:25954435
Support patient search on pathology reports with interactive online learning based data extraction.
Zheng, Shuai; Lu, James J; Appin, Christina; Brat, Daniel; Wang, Fusheng
2015-01-01
Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users' corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.
Development and usage of eXtension's HorseQuest: an online resource.
Greene, E A; Griffin, A S; Whittle, J; Williams, C A; Howard, A B; Anderson, K P
2010-08-01
eXtension (pronounced e-extension) is an online resource transforming how faculty can collaborate and deliver equine education. As the first Community of Practice launched from eXtension, HorseQuest (HQ) offers free, interactive, peer-reviewed, online resources on a variety of equine-related topics at http://www.extension.org. This group has adapted traditional educational content to the online environment to maximize search engine optimization, to be more discoverable and relevant in the online world. This means that HQ resources are consistently being found on the first page of search results. Also, by researching key words searched by Internet users, HQ has guided new content direction and determined potential webcast topics based on relevance and frequency of those searches. In addition to establishing good search engine optimization, HQ has been utilizing the viral networking aspect of YouTube by uploading clips of existing equine educational videos to YouTube. HorseQuest content appears in mainstream media, is passed on by the user, and helps HQ effectively reach their community of interest (horse enthusiasts). HorseQuest partners with My Horse University to produce webcasts that combine concise knowledge exchange via a scripted presentation with viewer chat and incoming questions. HorseQuest has produced and published content including 12 learning modules, 8 webchats, 21 webcasts, and 572 videos segments. After the official public launch, there was a steady increase in average number of visits/mo and average page views/mo over the 26-mo period. These regressions show a statistically significant increase in visits (P < 0.001) of approximately 450 visits per month and a significant increase in page views (P = 0.004) of about 373 page views per month. HorseQuest is a resource for several state 4-H advancement and competition programs and will continue to be incorporated into traditional extension programs, while reaching and affecting global audiences.
Content-based Music Search and Recommendation System
NASA Astrophysics Data System (ADS)
Takegawa, Kazuki; Hijikata, Yoshinori; Nishida, Shogo
Recently, the turn volume of music data on the Internet has increased rapidly. This has increased the user's cost to find music data suiting their preference from such a large data set. We propose a content-based music search and recommendation system. This system has an interface for searching and finding music data and an interface for editing a user profile which is necessary for music recommendation. By exploiting the visualization of the feature space of music and the visualization of the user profile, the user can search music data and edit the user profile. Furthermore, by exploiting the infomation which can be acquired from each visualized object in a mutually complementary manner, we make it easier for the user to search music data and edit the user profile. Concretely, the system gives to the user an information obtained from the user profile when searching music data and an information obtained from the feature space of music when editing the user profile.
ISART: A Generic Framework for Searching Books with Social Information
Cui, Xiao-Ping; Qu, Jiao; Geng, Bin; Zhou, Fang; Song, Li; Hao, Hong-Wei
2016-01-01
Effective book search has been discussed for decades and is still future-proof in areas as diverse as computer science, informatics, e-commerce and even culture and arts. A variety of social information contents (e.g, ratings, tags and reviews) emerge with the huge number of books on the Web, but how they are utilized for searching and finding books is seldom investigated. Here we develop an Integrated Search And Recommendation Technology (IsArt), which breaks new ground by providing a generic framework for searching books with rich social information. IsArt comprises a search engine to rank books with book contents and professional metadata, a Generalized Content-based Filtering model to thereafter rerank books with user-generated social contents, and a learning-to-rank technique to finally combine a wide range of diverse reranking results. Experiments show that this technology permits embedding social information to promote book search effectiveness, and IsArt, by making use of it, has the best performance on CLEF/INEX Social Book Search Evaluation datasets of all 4 years (from 2011 to 2014), compared with some other state-of-the-art methods. PMID:26863545
ISART: A Generic Framework for Searching Books with Social Information.
Yin, Xu-Cheng; Zhang, Bo-Wen; Cui, Xiao-Ping; Qu, Jiao; Geng, Bin; Zhou, Fang; Song, Li; Hao, Hong-Wei
2016-01-01
Effective book search has been discussed for decades and is still future-proof in areas as diverse as computer science, informatics, e-commerce and even culture and arts. A variety of social information contents (e.g, ratings, tags and reviews) emerge with the huge number of books on the Web, but how they are utilized for searching and finding books is seldom investigated. Here we develop an Integrated Search And Recommendation Technology (IsArt), which breaks new ground by providing a generic framework for searching books with rich social information. IsArt comprises a search engine to rank books with book contents and professional metadata, a Generalized Content-based Filtering model to thereafter rerank books with user-generated social contents, and a learning-to-rank technique to finally combine a wide range of diverse reranking results. Experiments show that this technology permits embedding social information to promote book search effectiveness, and IsArt, by making use of it, has the best performance on CLEF/INEX Social Book Search Evaluation datasets of all 4 years (from 2011 to 2014), compared with some other state-of-the-art methods.
Natural language information retrieval in digital libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strzalkowski, T.; Perez-Carballo, J.; Marinescu, M.
In this paper we report on some recent developments in joint NYU and GE natural language information retrieval system. The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and buildmore » a conceptual hierarchy specific to the database domain, and (3) process user`s natural language requests into effective search queries. This system has been used in NIST-sponsored Text Retrieval Conferences (TREC), where we worked with approximately 3.3 GBytes of text articles including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications`s Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. The system have been designed to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.« less
TRENDS: A flight test relational database user's guide and reference manual
NASA Technical Reports Server (NTRS)
Bondi, M. J.; Bjorkman, W. S.; Cross, J. L.
1994-01-01
This report is designed to be a user's guide and reference manual for users intending to access rotocraft test data via TRENDS, the relational database system which was developed as a tool for the aeronautical engineer with no programming background. This report has been written to assist novice and experienced TRENDS users. TRENDS is a complete system for retrieving, searching, and analyzing both numerical and narrative data, and for displaying time history and statistical data in graphical and numerical formats. This manual provides a 'guided tour' and a 'user's guide' for the new and intermediate-skilled users. Examples for the use of each menu item within TRENDS is provided in the Menu Reference section of the manual, including full coverage for TIMEHIST, one of the key tools. This manual is written around the XV-15 Tilt Rotor database, but does include an appendix on the UH-60 Blackhawk database. This user's guide and reference manual establishes a referrable source for the research community and augments NASA TM-101025, TRENDS: The Aeronautical Post-Test, Database Management System, Jan. 1990, written by the same authors.
NASA Technical Reports Server (NTRS)
Ferguson, James S.; Ferguson, Joanne E.; Peel, John, III; Vance, Larry
1995-01-01
Since initial contact between Earth Search Sciences, Inc. (ESSI) and the Idaho National Engineering Laboratory (INEL) in February, 1994, at least seven proposals have been submitted in response to a variety of solicitations to commercialize and improve the AVIRIS instrument. These proposals, matching ESSI's unique position with respect to agreements with the National Aeronautics and Space Administration (NASA) and the Jet Propulsion Laboratory (JPL) to utilize, miniaturize, and commercialize the AVIRIS instrument and platform, are combined with the applied engineering of the INEL. Teaming ESSI, NASA/JPL, and INEL with diverse industrial partners has strengthened the respective proposals. These efforts carefully structure the overall project plans to ensure the development, demonstration, and deployment of this concept to the national and international arenas. The objectives of these efforts include: (1) developing a miniaturized commercial, real-time, cost effective version of the AVIRIS instrument; (2) identifying multiple users for AVIRIS; (3) integrating the AVIRIS technology with other technologies; (4) gaining the confidence/acceptance of other government agencies and private industry in AVIRIS; and (5) increasing the technology base of U.S. industry.
Data engineering systems: Computerized modeling and data bank capabilities for engineering analysis
NASA Technical Reports Server (NTRS)
Kopp, H.; Trettau, R.; Zolotar, B.
1984-01-01
The Data Engineering System (DES) is a computer-based system that organizes technical data and provides automated mechanisms for storage, retrieval, and engineering analysis. The DES combines the benefits of a structured data base system with automated links to large-scale analysis codes. While the DES provides the user with many of the capabilities of a computer-aided design (CAD) system, the systems are actually quite different in several respects. A typical CAD system emphasizes interactive graphics capabilities and organizes data in a manner that optimizes these graphics. On the other hand, the DES is a computer-aided engineering system intended for the engineer who must operationally understand an existing or planned design or who desires to carry out additional technical analysis based on a particular design. The DES emphasizes data retrieval in a form that not only provides the engineer access to search and display the data but also links the data automatically with the computer analysis codes.
Web Feet Guide to Search Engines: Finding It on the Net.
ERIC Educational Resources Information Center
Web Feet, 2001
2001-01-01
This guide to search engines for the World Wide Web discusses selecting the right search engine; interpreting search results; major search engines; online tutorials and guides; search engines for kids; specialized search tools for various subjects; and other specialized engines and gateways. (LRW)
Search strategies of Wikipedia readers.
Rodi, Giovanna Chiara; Loreto, Vittorio; Tria, Francesca
2017-01-01
The quest for information is one of the most common activity of human beings. Despite the the impressive progress of search engines, not to miss the needed piece of information could be still very tough, as well as to acquire specific competences and knowledge by shaping and following the proper learning paths. Indeed, the need to find sensible paths in information networks is one of the biggest challenges of our societies and, to effectively address it, it is important to investigate the strategies adopted by human users to cope with the cognitive bottleneck of finding their way in a growing sea of information. Here we focus on the case of Wikipedia and investigate a recently released dataset about users' click on the English Wikipedia, namely the English Wikipedia Clickstream. We perform a semantically charged analysis to uncover the general patterns followed by information seekers in the multi-dimensional space of Wikipedia topics/categories. We discover the existence of well defined strategies in which users tend to start from very general, i.e., semantically broad, pages and progressively narrow down the scope of their navigation, while keeping a growing semantic coherence. This is unlike strategies associated to tasks with predefined search goals, namely the case of the Wikispeedia game. In this case users first move from the 'particular' to the 'universal' before focusing down again to the required target. The clear picture offered here represents a very important stepping stone towards a better design of information networks and recommendation strategies, as well as the construction of radically new learning paths.
Buck, Susanne
2009-01-01
Much attention is paid to the technical aspects of telemedicine in the development of new applications, but the enthusiasm about what is technically possible very often leads to the user acceptance of such products being neglected. The number of successful and sustainable telemedicine applications would be much higher if developers concentrated more on matters related to the cognitive-emotional situation of the users involved in telemedicine. The users include the care and cure providers, as well as the care and cure receivers. Based on an informal literature search and discussions with telemedicine implementation staff, nine factors have been identified which are essential for the user acceptance of telemedicine applications. All of them are connected more to the cognitive-emotional than to the cognitive-rational side of information processing. This suggests that in the future the cognitive-emotional side will need more attention. This in turn implies that the nine points mentioned above have to find their way into requirements engineering, development processes and product life cycles.
Science opportunity analyzer - a multi-mission tool for planning
NASA Technical Reports Server (NTRS)
Streiffert, B. A.; Polanskey, C. A.; O'Reilly, T.; Colwell, J.
2002-01-01
For many years the diverse scientific community that supports JPL's wide variety ofinterplanetary space missions has needed a tool in order to plan and develop their experiments. The tool needs to be easily adapted to various mission types and portable to the user community. The Science Opportunity Analyzer, SOA, now in its third year of development, is intended to meet this need. SOA is a java-based application that is designed to enable scientists to identify and analyze opportunities for science observations from spacecraft. It differs from other planning tools in that it does not require an in-depth knowledge of the spacecraft command system or operation modes to begin high level planning. Users can, however, develop increasingly detailed levels of design. SOA consists of six major functions: Opportunity Search, Visualization, Observation Design, Constraint Checking, Data Output and Communications. Opportunity Search is a GUI driven interface to existing search engines that can be used to identify times when a spacecraft is in a specific geometrical relationship with other bodies in the solar system. This function can be used for advanced mission planning as well as for making last minute adjustments to mission sequences in response to trajectory modifications. Visualization is a key aspect of SOA. The user can view observation opportunities in either a 3D representation or as a 2D map projection. The user is given extensive flexibility to customize what is displayed in the view. Observation Design allows the user to orient the spacecraft and visualize the projection of the instrument field of view for that orientation using the same views as Opportunity Search. Constraint Checking is provided to validate various geometrical and physical aspects of an observation design. The user has the ability to easily create custom rules or to use official project-generated flight rules. This capability may also allow scientists to easily impact the cost to science if flight rule changes occur. Data Output generates information based on the spacecraft's trajectory, opportunity search results or based on a created observation. The data can be viewed either in tabular format or as a graph. Finally, SOA is unique in that it is designed to be able to communicate with a variety of existing planning and sequencing tools. From the very beginning SOA was designed with the user in mind. Extensive surveys of the potential user community were conducted in order to develop the software requirements. Throughout the development period, close ties have been maintained with the science community to insure that the tool maintains its user focus. Although development is still in its early stages, SOA is already developing a user community on the Cassini project, which is depending on this tool for their science planning. There are other tools at JPL that do various pieces of what SOA can do; however, there is no other tool which combines all these functions and presents them to the user in such a convenient, cohesive, and easy to use fashion.
Recommender engine for continuous-time quantum Monte Carlo methods
NASA Astrophysics Data System (ADS)
Huang, Li; Yang, Yi-feng; Wang, Lei
2017-03-01
Recommender systems play an essential role in the modern business world. They recommend favorable items such as books, movies, and search queries to users based on their past preferences. Applying similar ideas and techniques to Monte Carlo simulations of physical systems boosts their efficiency without sacrificing accuracy. Exploiting the quantum to classical mapping inherent in the continuous-time quantum Monte Carlo methods, we construct a classical molecular gas model to reproduce the quantum distributions. We then utilize powerful molecular simulation techniques to propose efficient quantum Monte Carlo updates. The recommender engine approach provides a general way to speed up the quantum impurity solvers.
Best Practices for Building Web Data Portals
NASA Astrophysics Data System (ADS)
Anderson, R. A.; Drew, L.
2013-12-01
With a data archive of more than 1.5 petabytes and a key role as the NASA Distributed Active Archive Center (DAAC) for synthetic aperture radar (SAR) data, the Alaska Satellite Facility (ASF) has an imperative to develop effective Web data portals. As part of continuous enhancement and expansion of its website, ASF recently created two data portals for distribution of SAR data: one for the archiving and distribution of NASA's MEaSUREs Wetlands project and one for newly digitally processed data from NASA's 1978 Seasat satellite. These case studies informed ASF's development of the following set of best practices for developing Web data portals. 1) Maintain well-organized, quality data. This is fundamental. If data are poorly organized or contain errors, credibility is lost and the data will not be used. 2) Match data to likely data uses. 3) Identify audiences in as much detail as possible. ASF DAAC's Seasat and Wetlands portals target three groups of users: a) scientists already familiar with ASF DAAC's SAR archive and our data download tool, Vertex; b) scientists not familiar with SAR or ASF, but who can use the data for their research of oceans, sea ice, volcanoes, land deformation and other Earth sciences; c) audiences wishing to learn more about SAR and its use in Earth sciences. 4) Identify the heaviest data uses and the terms scientists search for online when trying to find data for those uses. 5) Create search engine optimized (SEO) Web content that corresponds to those searches. Because search engines do not yet search raw data, so Web data portals must include content that ties the data to its likely uses. 6) Create Web designs that best serves data users (user centered design), not for how the organization views itself or its data. Usability testing was conducted for the ASF DAAC Wetlands portal to improve the user experience. 7) Use SEO tips and techniques. The ASF DAAC Seasat portal used numerous SEO techniques, including social media, blogging technology, SEO rich content and more. As a result, it was on the first page of numerous related Google search results within 24 hours of the portal launch. 8) Build in-browser data analysis tools showing scientists how the data can be used in their research. The ASF DAAC Wetlands portal demonstrates that allowing the user to examine the data quickly and graphically online readily enables users to perceive the value of the data and how to use it. 9) Use responsive Web design (RWD) so content and tools can be accessed from a wide range of devices. Wetlands and Seasat can be accessed from smartphones, tablets and desktops. 10) Use Web frameworks to enable rapid building of new portals using consistent design patterns. Seasat and Wetlands both use Django and Twitter Bootstrap. 11) Use load-balanced servers if high demand for the data is anticipated. Using load-balanced servers for the Seasat and Wetlands portals allows ASF to simply add hardware as needed to support increased capacity. 12) Use open-source software when possible. Seasat and Wetlands portal development costs were reduced, and functionality was increased, with the use of open-source software. 13) Use third-party virtual servers (e.g. Amazon EC2 and S3 Services) where applicable. 14) Track visitors using analytic tools. 15) Continually improve design.
Detecting Disease Outbreaks in Mass Gatherings Using Internet Data
Yom-Tov, Elad; Cox, Ingemar J; McKendry, Rachel A
2014-01-01
Background Mass gatherings, such as music festivals and religious events, pose a health care challenge because of the risk of transmission of communicable diseases. This is exacerbated by the fact that participants disperse soon after the gathering, potentially spreading disease within their communities. The dispersion of participants also poses a challenge for traditional surveillance methods. The ubiquitous use of the Internet may enable the detection of disease outbreaks through analysis of data generated by users during events and shortly thereafter. Objective The intent of the study was to develop algorithms that can alert to possible outbreaks of communicable diseases from Internet data, specifically Twitter and search engine queries. Methods We extracted all Twitter postings and queries made to the Bing search engine by users who repeatedly mentioned one of nine major music festivals held in the United Kingdom and one religious event (the Hajj in Mecca) during 2012, for a period of 30 days and after each festival. We analyzed these data using three methods, two of which compared words associated with disease symptoms before and after the time of the festival, and one that compared the frequency of these words with those of other users in the United Kingdom in the days following the festivals. Results The data comprised, on average, 7.5 million tweets made by 12,163 users, and 32,143 queries made by 1756 users from each festival. Our methods indicated the statistically significant appearance of a disease symptom in two of the nine festivals. For example, cough was detected at higher than expected levels following the Wakestock festival. Statistically significant agreement (chi-square test, P<.01) between methods and across data sources was found where a statistically significant symptom was detected. Anecdotal evidence suggests that symptoms detected are indeed indicative of a disease that some users attributed to being at the festival. Conclusions Our work shows the feasibility of creating a public health surveillance system for mass gatherings based on Internet data. The use of multiple data sources and analysis methods was found to be advantageous for rejecting false positives. Further studies are required in order to validate our findings with data from public health authorities. PMID:24943128
Detecting disease outbreaks in mass gatherings using Internet data.
Yom-Tov, Elad; Borsa, Diana; Cox, Ingemar J; McKendry, Rachel A
2014-06-18
Mass gatherings, such as music festivals and religious events, pose a health care challenge because of the risk of transmission of communicable diseases. This is exacerbated by the fact that participants disperse soon after the gathering, potentially spreading disease within their communities. The dispersion of participants also poses a challenge for traditional surveillance methods. The ubiquitous use of the Internet may enable the detection of disease outbreaks through analysis of data generated by users during events and shortly thereafter. The intent of the study was to develop algorithms that can alert to possible outbreaks of communicable diseases from Internet data, specifically Twitter and search engine queries. We extracted all Twitter postings and queries made to the Bing search engine by users who repeatedly mentioned one of nine major music festivals held in the United Kingdom and one religious event (the Hajj in Mecca) during 2012, for a period of 30 days and after each festival. We analyzed these data using three methods, two of which compared words associated with disease symptoms before and after the time of the festival, and one that compared the frequency of these words with those of other users in the United Kingdom in the days following the festivals. The data comprised, on average, 7.5 million tweets made by 12,163 users, and 32,143 queries made by 1756 users from each festival. Our methods indicated the statistically significant appearance of a disease symptom in two of the nine festivals. For example, cough was detected at higher than expected levels following the Wakestock festival. Statistically significant agreement (chi-square test, P<.01) between methods and across data sources was found where a statistically significant symptom was detected. Anecdotal evidence suggests that symptoms detected are indeed indicative of a disease that some users attributed to being at the festival. Our work shows the feasibility of creating a public health surveillance system for mass gatherings based on Internet data. The use of multiple data sources and analysis methods was found to be advantageous for rejecting false positives. Further studies are required in order to validate our findings with data from public health authorities.
2013-01-01
Background Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. Results We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user’s query, advanced data searching based on the specified user’s query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. Conclusions search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/. PMID:23452691
Retrieving high-resolution images over the Internet from an anatomical image database
NASA Astrophysics Data System (ADS)
Strupp-Adams, Annette; Henderson, Earl
1999-12-01
The Visible Human Data set is an important contribution to the national collection of anatomical images. To enhance the availability of these images, the National Library of Medicine has supported the design and development of a prototype object-oriented image database which imports, stores, and distributes high resolution anatomical images in both pixel and voxel formats. One of the key database modules is its client-server Internet interface. This Web interface provides a query engine with retrieval access to high-resolution anatomical images that range in size from 100KB for browser viewable rendered images, to 1GB for anatomical structures in voxel file formats. The Web query and retrieval client-server system is composed of applet GUIs, servlets, and RMI application modules which communicate with each other to allow users to query for specific anatomical structures, and retrieve image data as well as associated anatomical images from the database. Selected images can be downloaded individually as single files via HTTP or downloaded in batch-mode over the Internet to the user's machine through an applet that uses Netscape's Object Signing mechanism. The image database uses ObjectDesign's object-oriented DBMS, ObjectStore that has a Java interface. The query and retrieval systems has been tested with a Java-CDE window system, and on the x86 architecture using Windows NT 4.0. This paper describes the Java applet client search engine that queries the database; the Java client module that enables users to view anatomical images online; the Java application server interface to the database which organizes data returned to the user, and its distribution engine that allow users to download image files individually and/or in batch-mode.
CDAPubMed: a browser extension to retrieve EHR-based biomedical literature.
Perez-Rey, David; Jimenez-Castellanos, Ana; Garcia-Remesal, Miguel; Crespo, Jose; Maojo, Victor
2012-04-05
Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems.
CDAPubMed: a browser extension to retrieve EHR-based biomedical literature
2012-01-01
Background Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. Results We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. Conclusions CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems. PMID:22480327
Herbal cancer cures on the Web: noncompliance with The Dietary Supplement Health and Education Act.
Bonakdar, Robert Alan
2002-01-01
A significant portion of the US population uses the Internet to obtain health information; nearly half of Internet users admit that this information influences decisions about their health care and medical treatments. Concurrently, approximately one third of the population uses herbal supplements; a higher percentage is noted for subgroups of cancer patients. The Dietary Supplement Health and Education Act (DSHEA) of 1994 contained regulatory standards for herbal supplements, including restricting any claims for disease prevention, treatment, or cure. This study determined the degree of compliance with the DSHEA, as applied to Internet sites focusing on the subject of herbal supplements and cancer. Internet searches were conducted using six popular search engines and three master search engines in October-December 2000 using the linked terms herb and cancer. The Internet sites identified through this search process were examined for categories of information including claims regarding prevention, treatment, or cure; commercial nature; DSHEA and physician consultation warnings; country of origin; and use of research and testimonials. Additionally, commercial sites were reviewed to identify tactics used to promote products or services. Each of the six primary search engines provided between 11,730 and 58,605 matches for herb and cancer. Further cross matching with the three master search engines identified 70 non-repeating sites that appeared on all three master search engines. Of these 70 sites, nine were irrelevant matches or no longer functioning. Of the remaining 61, 34 (54%) were commercial sites (CS) and 27 (42.8%) were noncommercial sites (NCS). Of the CS surveyed, prevention, treatment, and cure were discussed 92%, 89%, and 58%, respectively. CS provided testimonials, physician consultation recommendations, and DSHEA warnings 89%, 38.8%, and 36.1% of the time, respectively. CS provided research with references 30.6% of the time versus 92.6% of the time in NCS. All international commercial sites surveyed claimed herbal cancer cures. Although the DSHEA was enacted and amended to decrease unlawful claims of disease prevention, treatment, and cure, the results of this study indicate that such claims are prevalent on commercial Internet sites. A majority of sites claim cancer cures through herbal supplementation with little regardfor current regulations, and such claims were more common on sites operated from outside the United States.
Connecting the Astrophysics Data System and Planetary Data System
NASA Astrophysics Data System (ADS)
Eichhorn, G.; Kurtz, M. J.; Accomazzi, A.; Grant, C. S.; Murray, S. S.; Hughes, J. S.; Mortellaro, J.; McMahon, S. K.
1997-07-01
The Astrophysics Data System (ADS) provides access to astronomical literature through a sophisticated search engine. Over 10,000 users retrieve almost 5 million references and read more than 25,000 full text articles per month. ADS cooperates closely with all the main astronomical journals and data centers to create and maintain a state-of-the-art digital library. The Planetary Data System (PDS) publishes high quality peer reviewed planetary science data products, defines planetary archiving standards to make products usable, and provides science expertise to users in data product preparation and use. Data products are available to users on CD media, with more than 600 CD-ROM titles in the inventory from past missions as well as the recent releases from active planetary missions and observations. The ADS and PDS serve overlapping communities and offer complementary functions. The ADS and PDS are both part of the NASA Space Science Data System, sponsored by the Office of Space Science, which curates science data products for researchers and the general public. We are in the process of connecting these two data systems. As a first step we have included entries for PDS data sets in the ADS abstract service. This allows ADS users to find PDS data sets by searching for their descriptions through the ADS search system. The information returned from the ADS links directly to the data set's entry in the PDS data set catalog. After linking to this catalog, the user will have access to more comprehensive data set information, related ancillary information, and on-line data products. The PDS on the other hand will use the ADS to provide access to bibliographic information. This includes links from PDS data set catalog bibliographic citations to ADS abstracts and on-line articles. The cross-linking between these data systems allows each system to concentrate on its main objectives and utilize the other system to provide more and improved services to the users of both systems.
Burden of neurological diseases in the US revealed by web searches.
Baeza-Yates, Ricardo; Sangal, Puneet Mohan; Villoslada, Pablo
2017-01-01
Analyzing the disease-related web searches of Internet users provides insight into the interests of the general population as well as the healthcare industry, which can be used to shape health care policies. We analyzed the searches related to neurological diseases and drugs used in neurology using the most popular search engines in the US, Google and Bing/Yahoo. We found that the most frequently searched diseases were common diseases such as dementia or Attention Deficit/Hyperactivity Disorder (ADHD), as well as medium frequency diseases with high social impact such as Parkinson's disease, MS and ALS. The most frequently searched CNS drugs were generic drugs used for pain, followed by sleep disorders, dementia, ADHD, stroke and Parkinson's disease. Regarding the interests of the healthcare industry, ADHD, Alzheimer's disease, MS, ALS, meningitis, and hypersomnia received the higher advertising bids for neurological diseases, while painkillers and drugs for neuropathic pain, drugs for dementia or insomnia, and triptans had the highest advertising bidding prices. Web searches reflect the interest of people and the healthcare industry, and are based either on the frequency or social impact of the disease.
Understanding PubMed user search behavior through log analysis.
Islamaj Dogan, Rezarta; Murray, G Craig; Névéol, Aurélie; Lu, Zhiyong
2009-01-01
This article reports on a detailed investigation of PubMed users' needs and behavior as a step toward improving biomedical information retrieval. PubMed is providing free service to researchers with access to more than 19 million citations for biomedical articles from MEDLINE and life science journals. It is accessed by millions of users each day. Efficient search tools are crucial for biomedical researchers to keep abreast of the biomedical literature relating to their own research. This study provides insight into PubMed users' needs and their behavior. This investigation was conducted through the analysis of one month of log data, consisting of more than 23 million user sessions and more than 58 million user queries. Multiple aspects of users' interactions with PubMed are characterized in detail with evidence from these logs. Despite having many features in common with general Web searches, biomedical information searches have unique characteristics that are made evident in this study. PubMed users are more persistent in seeking information and they reformulate queries often. The three most frequent types of search are search by author name, search by gene/protein, and search by disease. Use of abbreviation in queries is very frequent. Factors such as result set size influence users' decisions. Analysis of characteristics such as these plays a critical role in identifying users' information needs and their search habits. In turn, such an analysis also provides useful insight for improving biomedical information retrieval.Database URL:http://www.ncbi.nlm.nih.gov/PubMed.
Beyond relevance and recall: testing new user-centred measures of database performance.
Stokes, Peter; Foster, Allen; Urquhart, Christine
2009-09-01
Measures of the effectiveness of databases have traditionally focused on recall, precision, with some debate on how relevance can be assessed, and by whom. New measures of database performance are required when users are familiar with search engines, and expect full text availability. This research ascertained which of four bibliographic databases (BNI, CINAHL, MEDLINE and EMBASE) could be considered most useful to nursing and midwifery students searching for information for an undergraduate dissertation. Searches on title were performed for dissertation topics supplied by nursing students (n = 9), who made the relevance judgements. Measures of recall and precision were combined with additional factors to provide measures of effectiveness, while efficiency combined measures of novelty and originality and accessibility combined measures for availability and retrievability, based on obtainability. There were significant differences among the databases in precision, originality and availability, but other differences were not significant (Friedman test). Odds ratio tests indicated that BNI, followed by CINAHL were the most effective, CINAHL the most efficient, and BNI the most accessible. The methodology could help library services in purchase decisions as the measure for accessibility, and odds ratio testing helped to differentiate database performance.
Learning to rank using user clicks and visual features for image retrieval.
Yu, Jun; Tao, Dacheng; Wang, Meng; Rui, Yong
2015-04-01
The inconsistency between textual features and visual contents can cause poor image search results. To solve this problem, click features, which are more reliable than textual information in justifying the relevance between a query and clicked images, are adopted in image ranking model. However, the existing ranking model cannot integrate visual features, which are efficient in refining the click-based search results. In this paper, we propose a novel ranking model based on the learning to rank framework. Visual features and click features are simultaneously utilized to obtain the ranking model. Specifically, the proposed approach is based on large margin structured output learning and the visual consistency is integrated with the click features through a hypergraph regularizer term. In accordance with the fast alternating linearization method, we design a novel algorithm to optimize the objective function. This algorithm alternately minimizes two different approximations of the original objective function by keeping one function unchanged and linearizing the other. We conduct experiments on a large-scale dataset collected from the Microsoft Bing image search engine, and the results demonstrate that the proposed learning to rank models based on visual features and user clicks outperforms state-of-the-art algorithms.
TokSearch: A search engine for fusion experimental data
Sammuli, Brian S.; Barr, Jayson L.; Eidietis, Nicholas W.; ...
2018-04-01
At a typical fusion research site, experimental data is stored using archive technologies that deal with each discharge as an independent set of data. These technologies (e.g. MDSplus or HDF5) are typically supplemented with a database that aggregates metadata for multiple shots to allow for efficient querying of certain predefined quantities. Often, however, a researcher will need to extract information from the archives, possibly for many shots, that is not available in the metadata store or otherwise indexed for quick retrieval. To address this need, a new search tool called TokSearch has been added to the General Atomics TokSys controlmore » design and analysis suite [1]. This tool provides the ability to rapidly perform arbitrary, parallelized queries of archived tokamak shot data (both raw and analyzed) over large numbers of shots. The TokSearch query API borrows concepts from SQL, and users can choose to implement queries in either MatlabTM or Python.« less
TokSearch: A search engine for fusion experimental data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sammuli, Brian S.; Barr, Jayson L.; Eidietis, Nicholas W.
At a typical fusion research site, experimental data is stored using archive technologies that deal with each discharge as an independent set of data. These technologies (e.g. MDSplus or HDF5) are typically supplemented with a database that aggregates metadata for multiple shots to allow for efficient querying of certain predefined quantities. Often, however, a researcher will need to extract information from the archives, possibly for many shots, that is not available in the metadata store or otherwise indexed for quick retrieval. To address this need, a new search tool called TokSearch has been added to the General Atomics TokSys controlmore » design and analysis suite [1]. This tool provides the ability to rapidly perform arbitrary, parallelized queries of archived tokamak shot data (both raw and analyzed) over large numbers of shots. The TokSearch query API borrows concepts from SQL, and users can choose to implement queries in either MatlabTM or Python.« less
NASA Astrophysics Data System (ADS)
Minnett, R.; Koppers, A.; Jarboe, N.; Tauxe, L.; Constable, C.; Jonestrask, L.
2017-12-01
Challenges are faced by both new and experienced users interested in contributing their data to community repositories, in data discovery, or engaged in potentially transformative science. The Magnetics Information Consortium (https://earthref.org/MagIC) has recently simplified its data model and developed a new containerized web application to reduce the friction in contributing, exploring, and combining valuable and complex datasets for the paleo-, geo-, and rock magnetic scientific community. The new data model more closely reflects the hierarchical workflow in paleomagnetic experiments to enable adequate annotation of scientific results and ensure reproducibility. The new open-source (https://github.com/earthref/MagIC) application includes an upload tool that is integrated with the data model to provide early data validation feedback and ease the friction of contributing and updating datasets. The search interface provides a powerful full text search of contributions indexed by ElasticSearch and a wide array of filters, including specific geographic and geological timescale filtering, to support both novice users exploring the database and experts interested in compiling new datasets with specific criteria across thousands of studies and millions of measurements. The datasets are not large, but they are complex, with many results from evolving experimental and analytical approaches. These data are also extremely valuable due to the cost in collecting or creating physical samples and the, often, destructive nature of the experiments. MagIC is heavily invested in encouraging young scientists as well as established labs to cultivate workflows that facilitate contributing their data in a consistent format. This eLightning presentation includes a live demonstration of the MagIC web application, developed as a configurable container hosting an isomorphic Meteor JavaScript application, MongoDB database, and ElasticSearch search engine. Visitors can explore the MagIC Database through maps and image or plot galleries or search and filter the raw measurements and their derived hierarchy of analytical interpretations.
A Web Search on Environmental Topics: What Is the Role of Ranking?
Filisetti, Barbara; Mascaretti, Silvia; Limina, Rosa Maria; Gelatti, Umberto
2013-01-01
Abstract Background: Although the Internet is easy to use, the mechanisms and logic behind a Web search are often unknown. Reliable information can be obtained, but it may not be visible as the Web site is not located in the first positions of search results. The possible risks of adverse health effects arising from environmental hazards are issues of increasing public interest, and therefore the information about these risks, particularly on topics for which there is no scientific evidence, is very crucial. The aim of this study was to investigate whether the presentation of information on some environmental health topics differed among various search engines, assuming that the most reliable information should come from institutional Web sites. Materials and Methods: Five search engines were used: Google, Yahoo!, Bing, Ask, and AOL. The following topics were searched in combination with the word “health”: “nuclear energy,” “electromagnetic waves,” “air pollution,” “waste,” and “radon.” For each topic three key words were used. The first 30 search results for each query were considered. The ranking variability among the search engines and the type of search results were analyzed for each topic and for each key word. The ranking of institutional Web sites was given particular consideration. Results: Variable results were obtained when surfing the Internet on different environmental health topics. Multivariate logistic regression analysis showed that, when searching for radon and air pollution topics, it is more likely to find institutional Web sites in the first 10 positions compared with nuclear power (odds ratio=3.4, 95% confidence interval 2.1–5.4 and odds ratio=2.9, 95% confidence interval 1.8–4.7, respectively) and also when using Google compared with Bing (odds ratio=3.1, 95% confidence interval 1.9–5.1). Conclusions: The increasing use of online information could play an important role in forming opinions. Web users should become more aware of the importance of finding reliable information, and health institutions should be able to make that information more visible. PMID:24083368
LIVIVO - the Vertical Search Engine for Life Sciences.
Müller, Bernd; Poley, Christoph; Pössel, Jana; Hagelstein, Alexandra; Gübitz, Thomas
2017-01-01
The explosive growth of literature and data in the life sciences challenges researchers to keep track of current advancements in their disciplines. Novel approaches in the life science like the One Health paradigm require integrated methodologies in order to link and connect heterogeneous information from databases and literature resources. Current publications in the life sciences are increasingly characterized by the employment of trans-disciplinary methodologies comprising molecular and cell biology, genetics, genomic, epigenomic, transcriptional and proteomic high throughput technologies with data from humans, plants, and animals. The literature search engine LIVIVO empowers retrieval functionality by incorporating various literature resources from medicine, health, environment, agriculture and nutrition. LIVIVO is developed in-house by ZB MED - Information Centre for Life Sciences. It provides a user-friendly and usability-tested search interface with a corpus of 55 Million citations derived from 50 databases. Standardized application programming interfaces are available for data export and high throughput retrieval. The search functions allow for semantic retrieval with filtering options based on life science entities. The service oriented architecture of LIVIVO uses four different implementation layers to deliver search services. A Knowledge Environment is developed by ZB MED to deal with the heterogeneity of data as an integrative approach to model, store, and link semantic concepts within literature resources and databases. Future work will focus on the exploitation of life science ontologies and on the employment of NLP technologies in order to improve query expansion, filters in faceted search, and concept based relevancy rankings in LIVIVO.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amy Robinson; Audrey Archuleta; Barbara Maes
1999-02-01
The Los Alamos Neutron Science Center Activity Report describes scientific and technological progress and achievements in LANSCE Division during the period of 1995 to 1998. This report includes a message from the Division Director, an overview of LANSCE, sponsor overviews, research highlights, advanced projects and facility upgrades achievements, experimental and user program accomplishments, news and events, and a list of publications. The research highlights cover the areas of condensed-matter science and engineering, accelerator science, nuclear science, and radiography. This report also contains a compact disk that includes an overview, the Activity Report itself, LANSCE operations progress reports for 1996 andmore » 1997, experiment reports from LANSCE users, as well as a search capability.« less
NASA Technical Reports Server (NTRS)
Rogers, J. L.; Barthelemy, J.-F. M.
1986-01-01
An expert system called EXADS has been developed to aid users of the Automated Design Synthesis (ADS) general purpose optimization program. ADS has approximately 100 combinations of strategy, optimizer, and one-dimensional search options from which to choose. It is difficult for a nonexpert to make this choice. This expert system aids the user in choosing the best combination of options based on the users knowledge of the problem and the expert knowledge stored in the knowledge base. The knowledge base is divided into three categories; constrained problems, unconstrained problems, and constrained problems being treated as unconstrained problems. The inference engine and rules are written in LISP, contains about 200 rules, and executes on DEC-VAX (with Franz-LISP) and IBM PC (with IQ-LISP) computers.
A search engine to access PubMed monolingual subsets: proof of concept and evaluation in French.
Griffon, Nicolas; Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques
2014-12-01
PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese.
A Search Engine to Access PubMed Monolingual Subsets: Proof of Concept and Evaluation in French
Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques
2014-01-01
Background PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. Objective The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). Methods To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. Results More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. Conclusions It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese. PMID:25448528
Klein, M S; Ross, F
1997-01-01
Using the results of the 1993 Medical Library Association (MLA) Hospital Libraries Section survey of hospital-based end-user search services, this article describes how end-user search services can become an impetus for an expanded information management and technology role for the hospital librarian. An end-user services implementation plan is presented that focuses on software, hardware, finances, policies, staff allocations and responsibilities, educational program design, and program evaluation. Possibilities for extending end-user search services into information technology and informatics, specialized end-user search systems, and Internet access are described. Future opportunities are identified for expanding the hospital librarian's role in the face of changing health care management, advances in information technology, and increasing end-user expectations. PMID:9285126
Ilic, D; Bessell, T L; Silagy, C A; Green, S
2003-03-01
The Internet provides consumers with access to online health information; however, identifying relevant and valid information can be problematic. Our objectives were firstly to investigate the efficiency of search-engines, and then to assess the quality of online information pertaining to androgen deficiency in the ageing male (ADAM). Keyword searches were performed on nine search-engines (four general and five medical) to identify website information regarding ADAM. Search-engine efficiency was compared by percentage of relevant websites obtained via each search-engine. The quality of information published on each website was assessed using the DISCERN rating tool. Of 4927 websites searched, 47 (1.44%) and 10 (0.60%) relevant websites were identified by general and medical search-engines respectively. The overall quality of online information on ADAM was poor. The quality of websites retrieved using medical search-engines did not differ significantly from those retrieved by general search-engines. Despite the poor quality of online information relating to ADAM, it is evident that medical search-engines are no better than general search-engines in sourcing consumer information relevant to ADAM.
Full Text and Figure Display Improves Bioscience Literature Search
Divoli, Anna; Wooldridge, Michael A.; Hearst, Marti A.
2010-01-01
When reading bioscience journal articles, many researchers focus attention on the figures and their captions. This observation led to the development of the BioText literature search engine [1], a freely available Web-based application that allows biologists to search over the contents of Open Access Journals, and see figures from the articles displayed directly in the search results. This article presents a qualitative assessment of this system in the form of a usability study with 20 biologist participants using and commenting on the system. 19 out of 20 participants expressed a desire to use a bioscience literature search engine that displays articles' figures alongside the full text search results. 15 out of 20 participants said they would use a caption search and figure display interface either frequently or sometimes, while 4 said rarely and 1 said undecided. 10 out of 20 participants said they would use a tool for searching the text of tables and their captions either frequently or sometimes, while 7 said they would use it rarely if at all, 2 said they would never use it, and 1 was undecided. This study found evidence, supporting results of an earlier study, that bioscience literature search systems such as PubMed should show figures from articles alongside search results. It also found evidence that full text and captions should be searched along with the article title, metadata, and abstract. Finally, for a subset of users and information needs, allowing for explicit search within captions for figures and tables is a useful function, but it is not entirely clear how to cleanly integrate this within a more general literature search interface. Such a facility supports Open Access publishing efforts, as it requires access to full text of documents and the lifting of restrictions in order to show figures in the search interface. PMID:20418942
ViA: a perceptual visualization assistant
NASA Astrophysics Data System (ADS)
Healey, Chris G.; St. Amant, Robert; Elhaddad, Mahmoud S.
2000-05-01
This paper describes an automated visualized assistant called ViA. ViA is designed to help users construct perceptually optical visualizations to represent, explore, and analyze large, complex, multidimensional datasets. We have approached this problem by studying what is known about the control of human visual attention. By harnessing the low-level human visual system, we can support our dual goals of rapid and accurate visualization. Perceptual guidelines that we have built using psychophysical experiments form the basis for ViA. ViA uses modified mixed-initiative planning algorithms from artificial intelligence to search of perceptually optical data attribute to visual feature mappings. Our perceptual guidelines are integrated into evaluation engines that provide evaluation weights for a given data-feature mapping, and hints on how that mapping might be improved. ViA begins by asking users a set of simple questions about their dataset and the analysis tasks they want to perform. Answers to these questions are used in combination with the evaluation engines to identify and intelligently pursue promising data-feature mappings. The result is an automatically-generated set of mappings that are perceptually salient, but that also respect the context of the dataset and users' preferences about how they want to visualize their data.
Towards a personalized Internet: a case for a full decentralization.
Kermarrec, Anne-Marie
2013-03-28
The Web has become a user-centric platform where users post, share, annotate, comment and forward content be it text, videos, pictures, URLs, etc. This social dimension creates tremendous new opportunities for information exchange over the Internet, as exemplified by the surprising and exponential growth of social networks and collaborative platforms. Yet, niche content is sometimes difficult to retrieve using traditional search engines because they target the mass rather than the individual. Likewise, relieving users from useless notification is tricky in a world where there is so much information and so little of interest for each and every one of us. We argue that ultra-specific content could be retrieved and disseminated should search and notification be personalized to fit this new setting. We also argue that users' interests should be implicitly captured by the system rather than relying on explicit classifications simply because the world is by nature unstructured, dynamic and users do not want to be hampered in their actions by a tight and static framework. In this paper, we review some existing personalization approaches, most of which are centralized. We then advocate the need for fully decentralized systems because personalization raises two main issues. Firstly, personalization requires information to be stored and maintained at a user granularity which can significantly hurt the scalability of a centralized solution. Secondly, at a time when the 'big brother is watching you' attitude is prominent, users may be more and more reluctant to give away their personal data to the few large companies that can afford such personalization. We start by showing how to achieve personalization in decentralized systems and conclude with the research agenda ahead.
Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis
Zhang, Liangliang; Zhu, Josh; Fang, Shiyuan; Cheng, Tim; Hong, Chloe; Shah, Nigam H
2016-01-01
Background By recent estimates, the steady rise in health care costs has deprived more than 45 million Americans of health care services and has encouraged health care providers to better understand the key drivers of health care utilization from a population health management perspective. Prior studies suggest the feasibility of mining population-level patterns of health care resource utilization from observational analysis of Internet search logs; however, the utility of the endeavor to the various stakeholders in a health ecosystem remains unclear. Objective The aim was to carry out a closed-loop evaluation of the utility of health care use predictions using the conversion rates of advertisements that were displayed to the predicted future utilizers as a surrogate. The statistical models to predict the probability of user’s future visit to a medical facility were built using effective predictors of health care resource utilization, extracted from a deidentified dataset of geotagged mobile Internet search logs representing searches made by users of the Baidu search engine between March 2015 and May 2015. Methods We inferred presence within the geofence of a medical facility from location and duration information from users’ search logs and putatively assigned medical facility visit labels to qualifying search logs. We constructed a matrix of general, semantic, and location-based features from search logs of users that had 42 or more search days preceding a medical facility visit as well as from search logs of users that had no medical visits and trained statistical learners for predicting future medical visits. We then carried out a closed-loop evaluation of the utility of health care use predictions using the show conversion rates of advertisements displayed to the predicted future utilizers. In the context of behaviorally targeted advertising, wherein health care providers are interested in minimizing their cost per conversion, the association between show conversion rate and predicted utilization score, served as a surrogate measure of the model’s utility. Results We obtained the highest area under the curve (0.796) in medical visit prediction with our random forests model and daywise features. Ablating feature categories one at a time showed that the model performance worsened the most when location features were dropped. An online evaluation in which advertisements were served to users who had a high predicted probability of a future medical visit showed a 3.96% increase in the show conversion rate. Conclusions Results from our experiments done in a research setting suggest that it is possible to accurately predict future patient visits from geotagged mobile search logs. Results from the offline and online experiments on the utility of health utilization predictions suggest that such prediction can have utility for health care providers. PMID:27655225
Semantic-Based Information Retrieval of Biomedical Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiao, Yu; Potok, Thomas E; Hurson, Ali R.
In this paper, we propose to improve the effectiveness of biomedical information retrieval via a medical thesaurus. We analyzed the deficiencies of the existing medical thesauri and reconstructed a new thesaurus, called MEDTHES, which follows the ANSI/NISO Z39.19-2003 standard. MEDTHES also endows the users with fine-grained control of information retrieval by providing functions to calculate the semantic similarity between words. We demonstrate the usage of MEDTHES through an existing data search engine.
World Wide Web Based Image Search Engine Using Text and Image Content Features
NASA Astrophysics Data System (ADS)
Luo, Bo; Wang, Xiaogang; Tang, Xiaoou
2003-01-01
Using both text and image content features, a hybrid image retrieval system for Word Wide Web is developed in this paper. We first use a text-based image meta-search engine to retrieve images from the Web based on the text information on the image host pages to provide an initial image set. Because of the high-speed and low cost nature of the text-based approach, we can easily retrieve a broad coverage of images with a high recall rate and a relatively low precision. An image content based ordering is then performed on the initial image set. All the images are clustered into different folders based on the image content features. In addition, the images can be re-ranked by the content features according to the user feedback. Such a design makes it truly practical to use both text and image content for image retrieval over the Internet. Experimental results confirm the efficiency of the system.
IPeak: An open source tool to combine results from multiple MS/MS search engines.
Wen, Bo; Du, Chaoqin; Li, Guilin; Ghali, Fawaz; Jones, Andrew R; Käll, Lukas; Xu, Shaohang; Zhou, Ruo; Ren, Zhe; Feng, Qiang; Xu, Xun; Wang, Jun
2015-09-01
Liquid chromatography coupled tandem mass spectrometry (LC-MS/MS) is an important technique for detecting peptides in proteomics studies. Here, we present an open source software tool, termed IPeak, a peptide identification pipeline that is designed to combine the Percolator post-processing algorithm and multi-search strategy to enhance the sensitivity of peptide identifications without compromising accuracy. IPeak provides a graphical user interface (GUI) as well as a command-line interface, which is implemented in JAVA and can work on all three major operating system platforms: Windows, Linux/Unix and OS X. IPeak has been designed to work with the mzIdentML standard from the Proteomics Standards Initiative (PSI) as an input and output, and also been fully integrated into the associated mzidLibrary project, providing access to the overall pipeline, as well as modules for calling Percolator on individual search engine result files. The integration thus enables IPeak (and Percolator) to be used in conjunction with any software packages implementing the mzIdentML data standard. IPeak is freely available and can be downloaded under an Apache 2.0 license at https://code.google.com/p/mzidentml-lib/. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
What Searches Do Users Run on PEDro? An Analysis of 893,971 Search Commands Over a 6-Month Period.
Stevens, Matthew L; Moseley, Anne; Elkins, Mark R; Lin, Christine C-W; Maher, Chris G
2016-08-05
Clinicians must be able to search effectively for relevant research if they are to provide evidence-based healthcare. It is therefore relevant to consider how users search databases of evidence in healthcare, including what information users look for and what search strategies they employ. To date such analyses have been restricted to the PubMed database. Although the Physiotherapy Evidence Database (PEDro) is searched millions of times each year, no studies have investigated how users search PEDro. To assess the content and quality of searches conducted on PEDro. Searches conducted on the PEDro website over 6 months were downloaded and the 'get' commands and page-views extracted. The following data were tabulated: the 25 most common searches; the number of search terms used; the frequency of use of simple and advanced searches, including the use of each advanced search field; and the frequency of use of various search strategies. Between August 2014 and January 2015, 893,971 search commands were entered on PEDro. Fewer than 18 % of these searches used the advanced search features of PEDro. 'Musculoskeletal' was the most common subdiscipline searched, while 'low back pain' was the most common individual search. Around 20 % of all searches contained errors. PEDro is a commonly used evidence resource, but searching appears to be sub-optimal in many cases. The effectiveness of searches conducted by users needs to improve, which could be facilitated by methods such as targeted training and amending the search interface.
Empirical Study of User Preferences Based on Rating Data of Movies
Zhao, YingSi; Shen, Bo
2016-01-01
User preference plays a prominent role in many fields, including electronic commerce, social opinion, and Internet search engines. Particularly in recommender systems, it directly influences the accuracy of the recommendation. Though many methods have been presented, most of these have only focused on how to improve the recommendation results. In this paper, we introduce an empirical study of user preferences based on a set of rating data about movies. We develop a simple statistical method to investigate the characteristics of user preferences. We find that the movies have potential characteristics of closure, which results in the formation of numerous cliques with a power-law size distribution. We also find that a user related to a small clique always has similar opinions on the movies in this clique. Then, we suggest a user preference model, which can eliminate the predictions that are considered to be impracticable. Numerical results show that the model can reflect user preference with remarkable accuracy when data elimination is allowed, and random factors in the rating data make prediction error inevitable. In further research, we will investigate many other rating data sets to examine the universality of our findings. PMID:26735847
Empirical Study of User Preferences Based on Rating Data of Movies.
Zhao, YingSi; Shen, Bo
2016-01-01
User preference plays a prominent role in many fields, including electronic commerce, social opinion, and Internet search engines. Particularly in recommender systems, it directly influences the accuracy of the recommendation. Though many methods have been presented, most of these have only focused on how to improve the recommendation results. In this paper, we introduce an empirical study of user preferences based on a set of rating data about movies. We develop a simple statistical method to investigate the characteristics of user preferences. We find that the movies have potential characteristics of closure, which results in the formation of numerous cliques with a power-law size distribution. We also find that a user related to a small clique always has similar opinions on the movies in this clique. Then, we suggest a user preference model, which can eliminate the predictions that are considered to be impracticable. Numerical results show that the model can reflect user preference with remarkable accuracy when data elimination is allowed, and random factors in the rating data make prediction error inevitable. In further research, we will investigate many other rating data sets to examine the universality of our findings.
Introducing Online Bibliographic Service to its Users: The Online Presentation
ERIC Educational Resources Information Center
Crane, Nancy B.; Pilachowski, David M.
1978-01-01
A description of techniques for introducing online services to new user groups includes discussion of terms and their definitions, evolution of online searching, advantages and disadvantages of online searching, production of the data bases, search strategies, Boolean logic, costs and charges, "do's and don'ts," and a user search questionnaire. (J…
Earthdata Search Summer ESIP Usability Workshop
NASA Technical Reports Server (NTRS)
Reese, Mark; Sirato, Jeff
2017-01-01
The Earthdata Search Client has undergone multiple rounds of usability testing during 2017 and the user feedback received has resulted in an enhanced user interface. This session will showcase the new Earthdata Search Client user interface and provide hands-on experience for participants to learn how to search, visualize and download data in the desired format.
HBVPathDB: a database of HBV infection-related molecular interaction network.
Zhang, Yi; Bo, Xiao-Chen; Yang, Jing; Wang, Sheng-Qi
2005-03-21
To describe molecules or genes interaction between hepatitis B viruses (HBV) and host, for understanding how virus' and host's genes and molecules are networked to form a biological system and for perceiving mechanism of HBV infection. The knowledge of HBV infection-related reactions was organized into various kinds of pathways with carefully drawn graphs in HBVPathDB. Pathway information is stored with relational database management system (DBMS), which is currently the most efficient way to manage large amounts of data and query is implemented with powerful Structured Query Language (SQL). The search engine is written using Personal Home Page (PHP) with SQL embedded and web retrieval interface is developed for searching with Hypertext Markup Language (HTML). We present the first version of HBVPathDB, which is a HBV infection-related molecular interaction network database composed of 306 pathways with 1 050 molecules involved. With carefully drawn graphs, pathway information stored in HBVPathDB can be browsed in an intuitive way. We develop an easy-to-use interface for flexible accesses to the details of database. Convenient software is implemented to query and browse the pathway information of HBVPathDB. Four search page layout options-category search, gene search, description search, unitized search-are supported by the search engine of the database. The database is freely available at http://www.bio-inf.net/HBVPathDB/HBV/. The conventional perspective HBVPathDB have already contained a considerable amount of pathway information with HBV infection related, which is suitable for in-depth analysis of molecular interaction network of virus and host. HBVPathDB integrates pathway data-sets with convenient software for query, browsing, visualization, that provides users more opportunity to identify regulatory key molecules as potential drug targets and to explore the possible mechanism of HBV infection based on gene expression datasets.
Seismic Search Engine: A distributed database for mining large scale seismic data
NASA Astrophysics Data System (ADS)
Liu, Y.; Vaidya, S.; Kuzma, H. A.
2009-12-01
The International Monitoring System (IMS) of the CTBTO collects terabytes worth of seismic measurements from many receiver stations situated around the earth with the goal of detecting underground nuclear testing events and distinguishing them from other benign, but more common events such as earthquakes and mine blasts. The International Data Center (IDC) processes and analyzes these measurements, as they are collected by the IMS, to summarize event detections in daily bulletins. Thereafter, the data measurements are archived into a large format database. Our proposed Seismic Search Engine (SSE) will facilitate a framework for data exploration of the seismic database as well as the development of seismic data mining algorithms. Analogous to GenBank, the annotated genetic sequence database maintained by NIH, through SSE, we intend to provide public access to seismic data and a set of processing and analysis tools, along with community-generated annotations and statistical models to help interpret the data. SSE will implement queries as user-defined functions composed from standard tools and models. Each query is compiled and executed over the database internally before reporting results back to the user. Since queries are expressed with standard tools and models, users can easily reproduce published results within this framework for peer-review and making metric comparisons. As an illustration, an example query is “what are the best receiver stations in East Asia for detecting events in the Middle East?” Evaluating this query involves listing all receiver stations in East Asia, characterizing known seismic events in that region, and constructing a profile for each receiver station to determine how effective its measurements are at predicting each event. The results of this query can be used to help prioritize how data is collected, identify defective instruments, and guide future sensor placements.
ProtaBank: A repository for protein design and engineering data.
Wang, Connie Y; Chang, Paul M; Ary, Marie L; Allen, Benjamin D; Chica, Roberto A; Mayo, Stephen L; Olafson, Barry D
2018-03-25
We present ProtaBank, a repository for storing, querying, analyzing, and sharing protein design and engineering data in an actively maintained and updated database. ProtaBank provides a format to describe and compare all types of protein mutational data, spanning a wide range of properties and techniques. It features a user-friendly web interface and programming layer that streamlines data deposition and allows for batch input and queries. The database schema design incorporates a standard format for reporting protein sequences and experimental data that facilitates comparison of results across different data sets. A suite of analysis and visualization tools are provided to facilitate discovery, to guide future designs, and to benchmark and train new predictive tools and algorithms. ProtaBank will provide a valuable resource to the protein engineering community by storing and safeguarding newly generated data, allowing for fast searching and identification of relevant data from the existing literature, and exploring correlations between disparate data sets. ProtaBank invites researchers to contribute data to the database to make it accessible for search and analysis. ProtaBank is available at https://protabank.org. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Adding a visualization feature to web search engines: it's time.
Wong, Pak Chung
2008-01-01
It's widely recognized that all Web search engines today are almost identical in presentation layout and behavior. In fact, the same presentation approach has been applied to depicting search engine results pages (SERPs) since the first Web search engine launched in 1993. In this Visualization Viewpoints article, I propose to add a visualization feature to Web search engines and suggest that the new addition can improve search engines' performance and capabilities, which in turn lead to better Web search technology.
ERIC Educational Resources Information Center
Auster, Ethel; Lawton, Stephen B.
This research study involved a systematic investigation into the relationships among: (1) the techniques used by search analysts during preliminary interviews with users before engaging in online retrieval of bibliographic citations; (2) the amount of new information gained by the user as a result of the search; and (3) the user's ultimate…
Improving Web Searches: Case Study of Quit-Smoking Web Sites for Teenagers
Skinner, Harvey
2003-01-01
Background The Web has become an important and influential source of health information. With the vast number of Web sites on the Internet, users often resort to popular search sites when searching for information. However, little is known about the characteristics of Web sites returned by simple Web searches for information about smoking cessation for teenagers. Objective To determine the characteristics of Web sites retrieved by search engines about smoking cessation for teenagers and how information quality correlates with the search ranking. Methods The top 30 sites returned by 4 popular search sites in response to the search terms "teen quit smoking" were examined. The information relevance and quality characteristics of these sites were evaluated by 2 raters. Objective site characteristics were obtained using a page-analysis Web site. Results Only 14 of the 30 Web sites are of direct relevance to smoking cessation for teenagers. The readability of about two-thirds of the 14 sites is below an eighth-grade school level and they ranked significantly higher (Kendall rank correlation, tau = -0.39, P= .05) in search-site results than sites with readability above or equal to that grade level. Sites that ranked higher were significantly associated with the presence of e-mail address for contact (tau = -0.46, P= .01), annotated hyperlinks to external sites (tau = -0.39, P= .04), and the presence of meta description tag (tau = -0.48, P= .002). The median link density (number of external sites that have a link to that site) of the Web pages was 6 and the maximum was 735. A higher link density was significantly associated with a higher rank (tau = -0.58, P= .02). Conclusions Using simple search terms on popular search sites to look for information on smoking cessation for teenagers resulted in less than half of the sites being of direct relevance. To improve search efficiency, users could supplement results obtained from simple Web searches with human-maintained Web directories and learn to refine their searches with more advanced search syntax. PMID:14713656
ERIC Educational Resources Information Center
Garman, Nancy
1999-01-01
Describes common options and features to consider in evaluating which meta search engine will best meet a searcher's needs. Discusses number and names of engines searched; other sources and specialty engines; search queries; other search options; and results options. (AEF)
NASA Astrophysics Data System (ADS)
Criado, Javier; Padilla, Nicolás; Iribarne, Luis; Asensio, Jose-Andrés
Due to the globalization of the information and knowledge society on the Internet, modern Web-based Information Systems (WIS) must be flexible and prepared to be easily accessible and manageable in real-time. In recent times it has received a special interest the globalization of information through a common vocabulary (i.e., ontologies), and the standardized way in which information is retrieved on the Web (i.e., powerful search engines, and intelligent software agents). These same principles of globalization and standardization should also be valid for the user interfaces of the WIS, but they are built on traditional development paradigms. In this paper we present an approach to reduce the gap of globalization/standardization in the generation of WIS user interfaces by using a real-time "bottom-up" composition perspective with COTS-interface components (type interface widgets) and trading services.
Computer use changes generalization of movement learning.
Wei, Kunlin; Yan, Xiang; Kong, Gaiqing; Yin, Cong; Zhang, Fan; Wang, Qining; Kording, Konrad Paul
2014-01-06
Over the past few decades, one of the most salient lifestyle changes for us has been the use of computers. For many of us, manual interaction with a computer occupies a large portion of our working time. Through neural plasticity, this extensive movement training should change our representation of movements (e.g., [1-3]), just like search engines affect memory [4]. However, how computer use affects motor learning is largely understudied. Additionally, as virtually all participants in studies of perception and actions are computer users, a legitimate question is whether insights from these studies bear the signature of computer-use experience. We compared non-computer users with age- and education-matched computer users in standard motor learning experiments. We found that people learned equally fast but that non-computer users generalized significantly less across space, a difference negated by two weeks of intensive computer training. Our findings suggest that computer-use experience shaped our basic sensorimotor behaviors, and this influence should be considered whenever computer users are recruited as study participants. Copyright © 2014 Elsevier Ltd. All rights reserved.
Helping Students Choose Tools To Search the Web.
ERIC Educational Resources Information Center
Cohen, Laura B.; Jacobson, Trudi E.
2000-01-01
Describes areas where faculty members can aid students in making intelligent use of the Web in their research. Differentiates between subject directories and search engines. Describes an engine's three components: spider, index, and search engine. Outlines two misconceptions: that Yahoo! is a search engine and that search engines contain all the…
The National Center for Atmospheric Research (NCAR) Research Data Archive: a Data Education Center
NASA Astrophysics Data System (ADS)
Peng, G. S.; Schuster, D.
2015-12-01
The National Center for Atmospheric Research (NCAR) Research Data Archive (RDA), rda.ucar.edu, is not just another data center or data archive. It is a data education center. We not only serve data, we TEACH data. Weather and climate data is the original "Big Data" dataset and lessons learned while playing with weather data are applicable to a wide range of data investigations. Erroneous data assumptions are the Achilles heel of Big Data. It doesn't matter how much data you crunch if the data is not what you think it is. Each dataset archived at the RDA is assigned to a data specialist (DS) who curates the data. If a user has a question not answered in the dataset information web pages, they can call or email a skilled DS for further clarification. The RDA's diverse staff—with academic training in meteorology, oceanography, engineering (electrical, civil, ocean and database), mathematics, physics, chemistry and information science—means we likely have someone who "speaks your language." Data discovery is another difficult Big Data problem; one can only solve problems with data if one can find the right data. Metadata, both machine and human-generated, underpin the RDA data search tools. Users can quickly find datasets by name or dataset ID number. They can also perform a faceted search that successively narrows the options by user requirements or simply kick off an indexed search with a few words. Weather data formats can be difficult to read for non-expert users; it's usually packed in binary formats requiring specialized software and parameter names use specialized vocabularies. DSs create detailed information pages for each dataset and maintain lists of helpful software, documentation and links of information around the web. We further grow the level of sophistication of the users with tips, tutorials and data stories on the RDA Blog, http://ncarrda.blogspot.com/. How-to video tutorials are also posted on the NCAR Computational and Information Systems Laboratory (CISL) YouTube channel.
Automated system function allocation and display format: Task information processing requirements
NASA Technical Reports Server (NTRS)
Czerwinski, Mary P.
1993-01-01
An important consideration when designing the interface to an intelligent system concerns function allocation between the system and the user. The display of information could be held constant, or 'fixed', leaving the user with the task of searching through all of the available information, integrating it, and classifying the data into a known system state. On the other hand, the system, based on its own intelligent diagnosis, could display only relevant information in order to reduce the user's search set. The user would still be left the task of perceiving and integrating the data and classifying it into the appropriate system state. Finally, the system could display the patterns of data. In this scenario, the task of integrating the data is carried out by the system, and the user's information processing load is reduced, leaving only the tasks of perception and classification of the patterns of data. Humans are especially adept at this form of display processing. Although others have examined the relative effectiveness of alphanumeric and graphical display formats, it is interesting to reexamine this issue together with the function allocation problem. Currently, Johnson Space Center is the test site for an intelligent Thermal Control System (TCS), TEXSYS, being tested for use with Space Station Freedom. Expert TCS engineers, as well as novices, were asked to classify several displays of TEXSYS data into various system states (including nominal and anomalous states). Three different display formats were used: fixed, subset, and graphical. The hypothesis tested was that the graphical displays would provide for fewer errors and faster classification times by both experts and novices, regardless of the kind of system state represented within the display. The subset displays were hypothesized to be the second most effective display format/function allocation condition, based on the fact that the search set is reduced in these displays. Both the subset and the graphic display conditions were hypothesized to be processed more efficiently than the fixed display conditions.
Cummins, Niamh Maria; Hannigan, Ailish; Shannon, Bill; Dunne, Colum; Cullen, Walter
2013-01-01
Background The Internet is a widely used source of information for patients searching for medical/health care information. While many studies have assessed existing medical/health care information on the Internet, relatively few have examined methods for design and delivery of such websites, particularly those aimed at the general public. Objective This study describes a method of evaluating material for new medical/health care websites, or for assessing those already in existence, which is correlated with higher rankings on Google's Search Engine Results Pages (SERPs). Methods A website quality assessment (WQA) tool was developed using criteria related to the quality of the information to be contained in the website in addition to an assessment of the readability of the text. This was retrospectively applied to assess existing websites that provide information about generic medicines. The reproducibility of the WQA tool and its predictive validity were assessed in this study. Results The WQA tool demonstrated very high reproducibility (intraclass correlation coefficient=0.95) between 2 independent users. A moderate to strong correlation was found between WQA scores and rankings on Google SERPs. Analogous correlations were seen between rankings and readability of websites as determined by Flesch Reading Ease and Flesch-Kincaid Grade Level scores. Conclusions The use of the WQA tool developed in this study is recommended as part of the design phase of a medical or health care information provision website, along with assessment of readability of the material to be used. This may ensure that the website performs better on Google searches. The tool can also be used retrospectively to make improvements to existing websites, thus, potentially enabling better Google search result positions without incurring the costs associated with Search Engine Optimization (SEO) professionals or paid promotion. PMID:23981848
Grooker, KartOO, Addict-o-Matic and More: Really Different Search Engines
ERIC Educational Resources Information Center
Descy, Don E.
2009-01-01
There are hundreds of unique search engines in the United States and thousands of unique search engines around the world. If people get into search engines designed just to search particular web sites, the number is in the hundreds of thousands. This article looks at: (1) clustering search engines, such as KartOO (www.kartoo.com) and Grokker…
Library Searching: An Industrial User's Viewpoint.
ERIC Educational Resources Information Center
Hendrickson, W. A.
1982-01-01
Discusses library searching of chemical literature from an industrial user's viewpoint, focusing on differences between academic and industrial researcher's searching techniques of the same problem area. Indicates that industry users need more exposure to patents, work with abstracting services and continued improvement in computer searching…
Strategies to explore functional genomics data sets in NCBI's GEO database.
Wilhite, Stephen E; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.
Strategies to Explore Functional Genomics Data Sets in NCBI’s GEO Database
Wilhite, Stephen E.; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries. PMID:22130872
A bibliography of literature pertaining to plague (Yersinia pestis)
Ellison, Laura E.; Frank, Megan K. Eberhardt
2011-01-01
Plague is an acute and often fatal zoonotic disease caused by the bacterium Yersinia pestis. Y. pestis mainly cycles between small mammals and their fleas; however, it has the potential to infect humans and frequently causes fatalities if left untreated. It is often considered a disease of the past; however, since the late 1800s, plagueis geographic range has expanded greatly, posing new threats in previously unaffected regions of the world, including the Western United States. A literature search was conducted using Internet resources and databases. The keywords chosen for the searches included plague, Yersinia pestis, management, control, wildlife, prairie dogs, fleas, North America, and mammals. Keywords were used alone or in combination with the other terms. Although this search pertains mostly to North America, citations were included from the international research community, as well. Databases and search engines used included Google (http://www.google.com), Google Scholar (http://scholar.google.com), SciVerse Scopus (http://www.scopus.com), ISI Web of Knowledge (http://apps.isiknowledge.com), and the USGS Library's Digital Desktop (http://library.usgs.gov). The literature-cited sections of manuscripts obtained from keyword searches were cross-referenced to identify additional citations or gray literature that was missed by the Internet search engines. This Open-File Report, published as an Internet-accessible bibliography, is intended to be periodically updated with new citations or older references that may have been missed during this compilation. Hence, the authors would be grateful to receive notice of any new or old papers that the audience (users) think need to be included.
Boyer, C; Baujard, V; Scherrer, J R
2001-01-01
Any new user to the Internet will think that to retrieve the relevant document is an easy task especially with the wealth of sources available on this medium, but this is not the case. Even experienced users have difficulty formulating the right query for making the most of a search tool in order to efficiently obtain an accurate result. The goal of this work is to reduce the time and the energy necessary in searching and locating medical and health information. To reach this goal we have developed HONselect [1]. The aim of HONselect is not only to improve efficiency in retrieving documents but to respond to an increased need for obtaining a selection of relevant and accurate documents from a breadth of various knowledge databases including scientific bibliographical references, clinical trials, daily news, multimedia illustrations, conferences, forum, Web sites, clinical cases, and others. The authors based their approach on the knowledge representation using the National Library of Medicine's Medical Subject Headings (NLM, MeSH) vocabulary and classification [2,3]. The innovation is to propose a multilingual "one-stop searching" (one Web interface to databases currently in English, French and German) with full navigational and connectivity capabilities. The user may choose from a given selection of related terms the one that best suit his search, navigate in the term's hierarchical tree, and access directly to a selection of documents from high quality knowledge suppliers such as the MEDLINE database, the NLM's ClinicalTrials.gov server, the NewsPage's daily news, the HON's media gallery, conference listings and MedHunt's Web sites [4, 5, 6, 7, 8, 9]. HONselect, developed by HON, a non-profit organisation [10], is a free online available multilingual tool based on the MeSH thesaurus to index, select, retrieve and display accurate, up to date, high-level and quality documents.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quock, D. E. R.; Cianciarulo, M. B.; APS Engineering Support Division
2007-01-01
The Integrated Relational Model of Installed Systems (IRMIS) is a relational database tool that has been implemented at the Advanced Photon Source to maintain an updated account of approximately 600 control system software applications, 400,000 process variables, and 30,000 control system hardware components. To effectively display this large amount of control system information to operators and engineers, IRMIS was initially built with nine Web-based viewers: Applications Organizing Index, IOC, PLC, Component Type, Installed Components, Network, Controls Spares, Process Variables, and Cables. However, since each viewer is designed to provide details from only one major category of the control system, themore » necessity for a one-stop global search tool for the entire database became apparent. The user requirements for extremely fast database search time and ease of navigation through search results led to the choice of Asynchronous JavaScript and XML (AJAX) technology in the implementation of the IRMIS global search tool. Unique features of the global search tool include a two-tier level of displayed search results, and a database data integrity validation and reporting mechanism.« less
Burden of neurological diseases in the US revealed by web searches
Baeza-Yates, Ricardo; Sangal, Puneet Mohan
2017-01-01
Background Analyzing the disease-related web searches of Internet users provides insight into the interests of the general population as well as the healthcare industry, which can be used to shape health care policies. Methods We analyzed the searches related to neurological diseases and drugs used in neurology using the most popular search engines in the US, Google and Bing/Yahoo. Results We found that the most frequently searched diseases were common diseases such as dementia or Attention Deficit/Hyperactivity Disorder (ADHD), as well as medium frequency diseases with high social impact such as Parkinson’s disease, MS and ALS. The most frequently searched CNS drugs were generic drugs used for pain, followed by sleep disorders, dementia, ADHD, stroke and Parkinson’s disease. Regarding the interests of the healthcare industry, ADHD, Alzheimer’s disease, MS, ALS, meningitis, and hypersomnia received the higher advertising bids for neurological diseases, while painkillers and drugs for neuropathic pain, drugs for dementia or insomnia, and triptans had the highest advertising bidding prices. Conclusions Web searches reflect the interest of people and the healthcare industry, and are based either on the frequency or social impact of the disease. PMID:28531237
CellAtlasSearch: a scalable search engine for single cells.
Srivastava, Divyanshu; Iyer, Arvind; Kumar, Vibhor; Sengupta, Debarka
2018-05-21
Owing to the advent of high throughput single cell transcriptomics, past few years have seen exponential growth in production of gene expression data. Recently efforts have been made by various research groups to homogenize and store single cell expression from a large number of studies. The true value of this ever increasing data deluge can be unlocked by making it searchable. To this end, we propose CellAtlasSearch, a novel search architecture for high dimensional expression data, which is massively parallel as well as light-weight, thus infinitely scalable. In CellAtlasSearch, we use a Graphical Processing Unit (GPU) friendly version of Locality Sensitive Hashing (LSH) for unmatched speedup in data processing and query. Currently, CellAtlasSearch features over 300 000 reference expression profiles including both bulk and single-cell data. It enables the user query individual single cell transcriptomes and finds matching samples from the database along with necessary meta information. CellAtlasSearch aims to assist researchers and clinicians in characterizing unannotated single cells. It also facilitates noise free, low dimensional representation of single-cell expression profiles by projecting them on a wide variety of reference samples. The web-server is accessible at: http://www.cellatlassearch.com.
[Advanced online search techniques and dedicated search engines for physicians].
Nahum, Yoav
2008-02-01
In recent years search engines have become an essential tool in the work of physicians. This article will review advanced search techniques from the world of information specialists, as well as some advanced search engine operators that may help physicians improve their online search capabilities, and maximize the yield of their searches. This article also reviews popular dedicated scientific and biomedical literature search engines.
Using Linked Open Data and Semantic Integration to Search Across Geoscience Repositories
NASA Astrophysics Data System (ADS)
Mickle, A.; Raymond, L. M.; Shepherd, A.; Arko, R. A.; Carbotte, S. M.; Chandler, C. L.; Cheatham, M.; Fils, D.; Hitzler, P.; Janowicz, K.; Jones, M.; Krisnadhi, A.; Lehnert, K. A.; Narock, T.; Schildhauer, M.; Wiebe, P. H.
2014-12-01
The MBLWHOI Library is a partner in the OceanLink project, an NSF EarthCube Building Block, applying semantic technologies to enable knowledge discovery, sharing and integration. OceanLink is testing ontology design patterns that link together: two data repositories, Rolling Deck to Repository (R2R), Biological and Chemical Oceanography Data Management Office (BCO-DMO); the MBLWHOI Library Institutional Repository (IR) Woods Hole Open Access Server (WHOAS); National Science Foundation (NSF) funded awards; and American Geophysical Union (AGU) conference presentations. The Library is collaborating with scientific users, data managers, DSpace engineers, experts in ontology design patterns, and user interface developers to make WHOAS, a DSpace repository, linked open data enabled. The goal is to allow searching across repositories without any of the information providers having to change how they manage their collections. The tools developed for DSpace will be made available to the community of users. There are 257 registered DSpace repositories in the United Stated and over 1700 worldwide. Outcomes include: Integration of DSpace with OpenRDF Sesame triple store to provide SPARQL endpoint for the storage and query of RDF representation of DSpace resources, Mapping of DSpace resources to OceanLink ontology, and DSpace "data" add on to provide resolvable linked open data representation of DSpace resources.
Intelligent retrieval of medical images from the Internet
NASA Astrophysics Data System (ADS)
Tang, Yau-Kuo; Chiang, Ted T.
1996-05-01
The object of this study is using Internet resources to provide a cost-effective, user-friendly method to access the medical image archive system and to provide an easy method for the user to identify the images required. This paper describes the prototype system architecture, the implementation, and results. In the study, we prototype the Intelligent Medical Image Retrieval (IMIR) system as a Hypertext Transport Prototype server and provide Hypertext Markup Language forms for user, as an Internet client, using browser to enter image retrieval criteria for review. We are developing the intelligent retrieval engine, with the capability to map the free text search criteria to the standard terminology used for medical image identification. We evaluate retrieved records based on the number of the free text entries matched and their relevance level to the standard terminology. We are in the integration and testing phase. We have collected only a few different types of images for testing and have trained a few phrases to map the free text to the standard medical terminology. Nevertheless, we are able to demonstrate the IMIR's ability to search, retrieve, and review medical images from the archives using general Internet browser. The prototype also uncovered potential problems in performance, security, and accuracy. Additional studies and enhancements will make the system clinically operational.
Tang, Muh-Chyun; Liu, Ying-Hsang; Wu, Wan-Ching
2013-09-01
Previous research has shown that information seekers in biomedical domain need more support in formulating their queries. A user study was conducted to evaluate the effectiveness of a metadata based query suggestion interface for PubMed bibliographic search. The study also investigated the impact of search task familiarity on search behaviors and the effectiveness of the interface. A real user, user search request and real system approach was used for the study. Unlike tradition IR evaluation, where assigned tasks were used, the participants were asked to search requests of their own. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. Several performance criteria were measured to assess the potential benefits of the experimental interface, including users' assessment of their original and eventual queries, the perceived usefulness of the interfaces, satisfaction with the search results, and the average relevance score of the saved records. The results show that, when searching for an unfamiliar topic, users were more likely to change their queries, indicating the effect of familiarity on search behaviors. The results also show that the interface scored higher on several of the performance criteria, such as the "goodness" of the queries, perceived usefulness, and user satisfaction. Furthermore, in line with our hypothesis, the proposed interface was relatively more effective when less familiar search requests were attempted. Results indicate that there is a selective compatibility between search familiarity and search interface. One implication of the research for system evaluation is the importance of taking into consideration task familiarity when assessing the effectiveness of interactive IR systems. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Design and Implementation of Distributed Crawler System Based on Scrapy
NASA Astrophysics Data System (ADS)
Fan, Yuhao
2018-01-01
At present, some large-scale search engines at home and abroad only provide users with non-custom search services, and a single-machine web crawler cannot sovle the difficult task. In this paper, Through the study and research of the original Scrapy framework, the original Scrapy framework is improved by combining Scrapy and Redis, a distributed crawler system based on Web information Scrapy framework is designed and implemented, and Bloom Filter algorithm is applied to dupefilter modul to reduce memory consumption. The movie information captured from douban is stored in MongoDB, so that the data can be processed and analyzed. The results show that distributed crawler system based on Scrapy framework is more efficient and stable than the single-machine web crawler system.
Skirton, Heather; Goldsmith, Lesley; Jackson, Leigh; Lewis, Celine; Chitty, Lyn S
2015-12-01
The development of non-invasive prenatal testing has increased accessibility of fetal testing. Companies are now advertising prenatal testing for aneuploidy via the Internet. The aim of this systematic review of websites advertising non-invasive prenatal testing for aneuploidy was to explore the nature of the information being provided to potential users. We systematically searched two Internet search engines for relevant websites using the following terms: 'prenatal test', 'antenatal test', 'non-invasive test', 'noninvasive test', 'cell-free fetal DNA', 'cffDNA', 'Down syndrome test' or 'trisomy test'. We examined the first 200 websites identified through each search. Relevant web-based text was examined, and key topics were identified, tabulated and counted. To analyse the text further, we used thematic analysis. Forty websites were identified. Whilst a number of sites provided balanced, accurate information, in the majority supporting evidence was not provided to underpin the information and there was inadequate information on the need for an invasive test to definitely diagnose aneuploidy. The information provided on many websites does not comply with professional recommendations. Guidelines are needed to ensure that companies offering prenatal testing via the Internet provide accurate and comprehensible information. © 2015 John Wiley & Sons, Ltd.
Building the interspace: Digital library infrastructure for a University Engineering Community
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schatz, B.
A large-scale digital library is being constructed and evaluated at the University of Illinois, with the goal of bringing professional search and display to Internet information services. A testbed planned to grow to 10K documents and 100K users is being constructed in the Grainger Engineering Library Information Center, as a joint effort of the University Library and the National Center for Supercomputing Applications (NCSA), with evaluation and research by the Graduate School of Library and Information Science and the Department of Computer Science. The electronic collection will be articles from engineering and science journals and magazines, obtained directly from publishersmore » in SGML format and displayed containing all text, figures, tables, and equations. The publisher partners include IEEE Computer Society, AIAA (Aerospace Engineering), American Physical Society, and Wiley & Sons. The software will be based upon NCSA Mosaic as a network engine connected to commercial SGML displayers and full-text searchers. The users will include faculty/students across the midwestern universities in the Big Ten, with evaluations via interviews, surveys, and transaction logs. Concurrently, research into scaling the testbed is being conducted. This includes efforts in computer science, information science, library science, and information systems. These efforts will evaluate different semantic retrieval technologies, including automatic thesaurus and subject classification graphs. New architectures will be designed and implemented for a next generation digital library infrastructure, the Interspace, which supports interaction with information spread across information spaces within the Net.« less
Jones, Andrew R; Siepen, Jennifer A; Hubbard, Simon J; Paton, Norman W
2009-03-01
LC-MS experiments can generate large quantities of data, for which a variety of database search engines are available to make peptide and protein identifications. Decoy databases are becoming widely used to place statistical confidence in result sets, allowing the false discovery rate (FDR) to be estimated. Different search engines produce different identification sets so employing more than one search engine could result in an increased number of peptides (and proteins) being identified, if an appropriate mechanism for combining data can be defined. We have developed a search engine independent score, based on FDR, which allows peptide identifications from different search engines to be combined, called the FDR Score. The results demonstrate that the observed FDR is significantly different when analysing the set of identifications made by all three search engines, by each pair of search engines or by a single search engine. Our algorithm assigns identifications to groups according to the set of search engines that have made the identification, and re-assigns the score (combined FDR Score). The combined FDR Score can differentiate between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine.
The Theory of Planned Behaviour Applied to Search Engines as a Learning Tool
ERIC Educational Resources Information Center
Liaw, Shu-Sheng
2004-01-01
Search engines have been developed for helping learners to seek online information. Based on theory of planned behaviour approach, this research intends to investigate the behaviour of using search engines as a learning tool. After factor analysis, the results suggest that perceived satisfaction of search engine, search engines as an information…
Experiences in Training End-User Searchers.
ERIC Educational Resources Information Center
Haines, Judith S.
1982-01-01
Describes study of chemists in the Chemistry Division, Organic Research Laboratory, Eastman Kodak Company, as end-user searchers on the DIALOG system searching primarily the "Chemical Abstracts" database. Training, level of use, online browsing, types of searches, satisfaction, costs, and value of end-user searching are highlighted.…
Digital Archive Issues from the Perspective of an Earth Science Data Producer
NASA Technical Reports Server (NTRS)
Barkstrom, Bruce R.
2004-01-01
Contents include the following: Introduction. A Producer Perspective on Earth Science Data. Data Producers as Members of a Scientific Community. Some Unique Characteristics of Scientific Data. Spatial and Temporal Sampling for Earth (or Space) Science Data. The Influence of the Data Production System Architecture. The Spatial and Temporal Structures Underlying Earth Science Data. Earth Science Data File (or Relation) Schemas. Data Producer Configuration Management Complexities. The Topology of Earth Science Data Inventories. Some Thoughts on the User Perspective. Science Data User Communities. Spatial and Temporal Structure Needs of Different Users. User Spatial Objects. Data Search Services. Inventory Search. Parameter (Keyword) Search. Metadata Searches. Documentation Search. Secondary Index Search. Print Technology and Hypertext. Inter-Data Collection Configuration Management Issues. An Archive View. Producer Data Ingest and Production. User Data Searching and Distribution. Subsetting and Supersetting. Semantic Requirements for Data Interchange. Tentative Conclusions. An Object Oriented View of Archive Information Evolution. Scientific Data Archival Issues. A Perspective on the Future of Digital Archives for Scientific Data. References Index for this paper.
The Effect of Limited Health Literacy on How Internet Users Learn About Diabetes.
Yom-Tov, Elad; Marino, Barbara; Pai, Jennifer; Harris, Dawn; Wolf, Michael
2016-10-01
The Internet continues to be an important supplemental health information resource for an increasing number of U.S. adults, especially for those with a new or existing chronic condition. Here we examine how people use the Internet to learn about Type 2 diabetes and how health literacy (HL) influences this information-seeking behavior. We analyzed the searches of approximately 2 million people who queried for diabetes-related information on Microsoft's Bing search engine. The HL of searchers was imputed through a community-based HL score. Topics searched were categorized and subsequent websites were assessed for readability. Overall, diabetes information-seeking strategies via the Internet are similar among adults with limited and adequate HL skills. However, people with limited HL take a longer time to read pages that are quickly read by people with adequate HL and vice versa. Information seeking among the former is terminated prematurely, as is evident from a Hidden Markov Model of the search process. Our findings indicate that the reading level required to understand the majority of diabetes-related information is high. Especially on government websites, more than 80% of information requires a reading level corresponding to 7th grade or higher. Our results indicate that individuals with lower HL may disproportionately struggle with Internet searches and fail to get an equivalent benefit from this information resource compared to users with greater HL. Future interventions should target the quality and ease of navigation of health care websites and find ways to leverage other relevant professionals to encourage and promote successful information access on the Web.
Spiders and Worms and Crawlers, Oh My: Searching on the World Wide Web.
ERIC Educational Resources Information Center
Eagan, Ann; Bender, Laura
Searching on the world wide web can be confusing. A myriad of search engines exist, often with little or no documentation, and many of these search engines work differently from the standard search engines people are accustomed to using. Intended for librarians, this paper defines search engines, directories, spiders, and robots, and covers basics…
Dynamics of a macroscopic model characterizing mutualism of search engines and web sites
NASA Astrophysics Data System (ADS)
Wang, Yuanshi; Wu, Hong
2006-05-01
We present a model to describe the mutualism relationship between search engines and web sites. In the model, search engines and web sites benefit from each other while the search engines are derived products of the web sites and cannot survive independently. Our goal is to show strategies for the search engines to survive in the internet market. From mathematical analysis of the model, we show that mutualism does not always result in survival. We show various conditions under which the search engines would tend to extinction, persist or grow explosively. Then by the conditions, we deduce a series of strategies for the search engines to survive in the internet market. We present conditions under which the initial number of consumers of the search engines has little contribution to their persistence, which is in agreement with the results in previous works. Furthermore, we show novel conditions under which the initial value plays an important role in the persistence of the search engines and deduce new strategies. We also give suggestions for the web sites to cooperate with the search engines in order to form a win-win situation.
Human Systems Integration Competency Development for Navy Systems Commands
2012-09-01
cognizance of Applied Engineering /Psychology relative to knowledge engineering, training, teamwork, user interface design and decision sciences. KSA...cognizance of Applied Engineering /Psychology relative to knowledge engineering, training, teamwork, user interface design and decision sciences...requirements (as required). Fundamental cognizance of Applied Engineering / Psychology relative to knowledge engineering, training, team work, user
[Digital media as laypeople's source of information about the environment and health].
Sassenberg, Kai
2017-06-01
Over the last two decades, the Internet has become the primary source of information. Thanks to the Internet, laypeople have access to information from the health and the environmental sector, which was for a long time available only to experts (e. g. scientific publications, statistics). Information on the Internet varies in quality, as generally anybody can publish online, without any quality control. At the same time, Internet use comes with specific situational characteristics. Given that the amount of information is nearly unlimited and that this information is easily available via search engines, users are not restricted to one or just a few texts, but can choose between multiple sources depending on their motivation and interest. Together with the heterogeneity of the sources, this provides the basis for a strong impact of motivation on the process and the outcomes of information acquisition online. Based on empirical research in the domain of Internet searching in the health sector, the current article discusses the impact of the use of digital media in the context of environmental medicine. Research has led to four conclusions: (1) Users are not sufficiently sensitive to the quality of information. (2) Information supporting their own opinion is preferably processed. (3) Users who feel threatened focus on positive information. (4) Vigilant users focus on negative information, which might result in cyberchrondria. The implications of these effects for the use of digital media in the sector of environmental medicine are discussed.
Online health information on obesity in pregnancy: a systematic review.
Al Wattar, Bassel H; Pidgeon, Connie; Learner, Hazel; Zamora, Javier; Thangaratinam, Shakila
2016-11-01
To assess the quality of health information available online for healthcare users on obesity in pregnancy and evaluate the role of the internet as an effective medium to advocate a healthy lifestyle in pregnancy. We used the poly-search engine Polymeta and complimented the results with Google searches (till July 2015) to identify relevant websites. All open access websites in English providing advice on the risks and management of obesity in pregnancy. Two independent reviewers assessed the quality of information provided in each of the included websites for credibility, accuracy, readability, content quality and technology. We compared websites 'quality according to their target population, health topic and source of funding'. Fifty-three websites were included. A third of websites were focused on obesity in pregnancy and two thirds targeted healthcare users. The median value for the overall credibility was 5/9, 7/12 for accuracy, 57.6/100 for readability, 45/80 for content quality and 75/100 for technology. Obesity specific websites provided lower credibility compared to general health websites (p=0.008). Websites targeting health users were easier to read (p=0.001). Non-governmental funded websites demonstrated higher content quality (p=0.005). Websites that are obesity focused, targeting health users and funded by non-governmental bodies demonstrated higher composite quality scores (p=0.048). Online information on obesity in pregnancy is varied. Governmental bodies in particular need to invest more efforts to improve the quality of online health information. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Utilization of a radiology-centric search engine.
Sharpe, Richard E; Sharpe, Megan; Siegel, Eliot; Siddiqui, Khan
2010-04-01
Internet-based search engines have become a significant component of medical practice. Physicians increasingly rely on information available from search engines as a means to improve patient care, provide better education, and enhance research. Specialized search engines have emerged to more efficiently meet the needs of physicians. Details about the ways in which radiologists utilize search engines have not been documented. The authors categorized every 25th search query in a radiology-centric vertical search engine by radiologic subspecialty, imaging modality, geographic location of access, time of day, use of abbreviations, misspellings, and search language. Musculoskeletal and neurologic imagings were the most frequently searched subspecialties. The least frequently searched were breast imaging, pediatric imaging, and nuclear medicine. Magnetic resonance imaging and computed tomography were the most frequently searched modalities. A majority of searches were initiated in North America, but all continents were represented. Searches occurred 24 h/day in converted local times, with a majority occurring during the normal business day. Misspellings and abbreviations were common. Almost all searches were performed in English. Search engine utilization trends are likely to mirror trends in diagnostic imaging in the region from which searches originate. Internet searching appears to function as a real-time clinical decision-making tool, a research tool, and an educational resource. A more thorough understanding of search utilization patterns can be obtained by analyzing phrases as actually entered as well as the geographic location and time of origination. This knowledge may contribute to the development of more efficient and personalized search engines.
DataMed - an open source discovery index for finding biomedical datasets.
Chen, Xiaoling; Gururaj, Anupama E; Ozyurt, Burak; Liu, Ruiling; Soysal, Ergin; Cohen, Trevor; Tiryaki, Firat; Li, Yueling; Zong, Nansu; Jiang, Min; Rogith, Deevakar; Salimi, Mandana; Kim, Hyeon-Eui; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra; Farcas, Claudiu; Johnson, Todd; Margolis, Ron; Alter, George; Sansone, Susanna-Assunta; Fore, Ian M; Ohno-Machado, Lucila; Grethe, Jeffrey S; Xu, Hua
2018-01-13
Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community. © The Author 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
An Exploratory Survey of Student Perspectives Regarding Search Engines
ERIC Educational Resources Information Center
Alshare, Khaled; Miller, Don; Wenger, James
2005-01-01
This study explored college students' perceptions regarding their use of search engines. The main objective was to determine how frequently students used various search engines, whether advanced search features were used, and how many search engines were used. Various factors that might influence student responses were examined. Results showed…
Search strategies of Wikipedia readers
2017-01-01
The quest for information is one of the most common activity of human beings. Despite the the impressive progress of search engines, not to miss the needed piece of information could be still very tough, as well as to acquire specific competences and knowledge by shaping and following the proper learning paths. Indeed, the need to find sensible paths in information networks is one of the biggest challenges of our societies and, to effectively address it, it is important to investigate the strategies adopted by human users to cope with the cognitive bottleneck of finding their way in a growing sea of information. Here we focus on the case of Wikipedia and investigate a recently released dataset about users’ click on the English Wikipedia, namely the English Wikipedia Clickstream. We perform a semantically charged analysis to uncover the general patterns followed by information seekers in the multi-dimensional space of Wikipedia topics/categories. We discover the existence of well defined strategies in which users tend to start from very general, i.e., semantically broad, pages and progressively narrow down the scope of their navigation, while keeping a growing semantic coherence. This is unlike strategies associated to tasks with predefined search goals, namely the case of the Wikispeedia game. In this case users first move from the ‘particular’ to the ‘universal’ before focusing down again to the required target. The clear picture offered here represents a very important stepping stone towards a better design of information networks and recommendation strategies, as well as the construction of radically new learning paths. PMID:28152030
Social Search: A Taxonomy of, and a User-Centred Approach to, Social Web Search
ERIC Educational Resources Information Center
McDonnell, Michael; Shiri, Ali
2011-01-01
Purpose: The purpose of this paper is to introduce the notion of social search as a new concept, drawing upon the patterns of web search behaviour. It aims to: define social search; present a taxonomy of social search; and propose a user-centred social search method. Design/methodology/approach: A mixed method approach was adopted to investigate…
NASA Indexing Benchmarks: Evaluating Text Search Engines
NASA Technical Reports Server (NTRS)
Esler, Sandra L.; Nelson, Michael L.
1997-01-01
The current proliferation of on-line information resources underscores the requirement for the ability to index collections of information and search and retrieve them in a convenient manner. This study develops criteria for analytically comparing the index and search engines and presents results for a number of freely available search engines. A product of this research is a toolkit capable of automatically indexing, searching, and extracting performance statistics from each of the focused search engines. This toolkit is highly configurable and has the ability to run these benchmark tests against other engines as well. Results demonstrate that the tested search engines can be grouped into two levels. Level one engines are efficient on small to medium sized data collections, but show weaknesses when used for collections 100MB or larger. Level two search engines are recommended for data collections up to and beyond 100MB.
ERIC Educational Resources Information Center
Rushton, Erin E.; Kelehan, Martha Daisy; Strong, Marcy A.
2008-01-01
Search engine use is one of the most popular online activities. According to a recent OCLC report, nearly all students start their electronic research using a search engine instead of the library Web site. Instead of viewing search engines as competition, however, librarians at Binghamton University Libraries decided to employ search engine…
Food supplement-related risks in the light of internet and RASFF data.
Wróbel-Harmas, Monika; Krysińska, Magdalena; Postupolski, Jacek; Wysocki, Mirosław J
2014-01-01
Based on legal acts and RASFF information, this paper aimed at evaluating available facts on food supplements in comparison to the most popular data accessible via Internet for future and present consumers. Having analyzed legal acts and RASFF (Rapid Alert System for Food and Feed) database, the authors attempted to verify what kind of information on food supplements may be found by an Internet user, using the first webpage of Google.pl. This search engine was used in this study as it gained the highest popularity among Internet users. It was decided that exclusively search results displayed on the first webpage would be subject to analysis as 91.5% of Internet users limit their search to the first 9-10 results. Internet was searched using the following two terms: 'supplement' and 'supplements' as well as terms suggested by Google. pl. Subsequently, the results were subject to qualitative and quantitative analyses. On the Internet, the most frequently searched terms were: 'supplements' (243 000 000), 'supplement' (9 290 000), 'supplements shop' (8 200 000). Having analyzed the content of particular websites, information on certain products, given by their manufacturers may be found. Then, data on supplement itself were provided, i.e. what is a supplement and when it should be used. Expert articles (written by physicians, dieticians, pharmacists) on a risk resulting from these products, including therapeutic indications or the presence of unauthorized substances were identified considerably less frequently. No warnings regarding the necessity of purchasing the products in legal and verified places were found. There is a necessity of systemic education of consumers on reasonable use of food supplements. It is also advisable to consider the organization of alert system whose objective would be to monitor adverse reactions caused by an intake of food supplements or novel food launched into the country. To obtain reliable information on the composition and effects of food supplements, potential consumer should contact physician or dietician. Additionally, complementary information, using different sources with an example being health-related portals, presenting articles written by physicians or pharmacists, may be also searched.
Teen smoking cessation help via the Internet: a survey of search engines.
Edwards, Christine C; Elliott, Sean P; Conway, Terry L; Woodruff, Susan I
2003-07-01
The objective of this study was to assess Web sites related to teen smoking cessation on the Internet. Seven Internet search engines were searched using the keywords teen quit smoking. The top 20 hits from each search engine were reviewed and categorized. The keywords teen quit smoking produced between 35 and 400,000 hits depending on the search engine. Of 140 potential hits, 62% were active, unique sites; 85% were listed by only one search engine; and 40% focused on cessation. Findings suggest that legitimate on-line smoking cessation help for teens is constrained by search engine choice and the amount of time teens spend looking through potential sites. Resource listings should be updated regularly. Smoking cessation Web sites need to be picked up on multiple search engine searches. Further evaluation of smoking cessation Web sites need to be conducted to identify the most effective help for teens.
[Development of domain specific search engines].
Takai, T; Tokunaga, M; Maeda, K; Kaminuma, T
2000-01-01
As cyber space exploding in a pace that nobody has ever imagined, it becomes very important to search cyber space efficiently and effectively. One solution to this problem is search engines. Already a lot of commercial search engines have been put on the market. However these search engines respond with such cumbersome results that domain specific experts can not tolerate. Using a dedicate hardware and a commercial software called OpenText, we have tried to develop several domain specific search engines. These engines are for our institute's Web contents, drugs, chemical safety, endocrine disruptors, and emergent response for chemical hazard. These engines have been on our Web site for testing.
A unified architecture for biomedical search engines based on semantic web technologies.
Jalali, Vahid; Matash Borujerdi, Mohammad Reza
2011-04-01
There is a huge growth in the volume of published biomedical research in recent years. Many medical search engines are designed and developed to address the over growing information needs of biomedical experts and curators. Significant progress has been made in utilizing the knowledge embedded in medical ontologies and controlled vocabularies to assist these engines. However, the lack of common architecture for utilized ontologies and overall retrieval process, hampers evaluating different search engines and interoperability between them under unified conditions. In this paper, a unified architecture for medical search engines is introduced. Proposed model contains standard schemas declared in semantic web languages for ontologies and documents used by search engines. Unified models for annotation and retrieval processes are other parts of introduced architecture. A sample search engine is also designed and implemented based on the proposed architecture in this paper. The search engine is evaluated using two test collections and results are reported in terms of precision vs. recall and mean average precision for different approaches used by this search engine.
Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature
Müller, Hans-Michael; Kenny, Eimear E
2004-01-01
We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso's two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc.) and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., biological process, etc.). Together they form a catalog of types of objects and concepts called an ontology. After this ontology is populated with terms, the whole corpus of articles and abstracts is marked up to identify terms of these categories. The current ontology comprises 33 categories of terms. A search engine enables the user to search for one or a combination of these tags and/or keywords within a sentence or document, and as the ontology allows word meaning to be queried, it is possible to formulate semantic queries. Full text access increases recall of biological data types from 45% to 95%. Extraction of particular biological facts, such as gene-gene interactions, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences; in searches for two uniquely named genes and an interaction term, the ontology confers a 3-fold increase of search efficiency. Textpresso currently focuses on Caenorhabditis elegans literature, with 3,800 full text articles and 16,000 abstracts. The lexicon of the ontology contains 14,500 entries, each of which includes all versions of a specific word or phrase, and it includes all categories of the Gene Ontology database. Textpresso is a useful curation tool, as well as search engine for researchers, and can readily be extended to other organism-specific corpora of text. Textpresso can be accessed at http://www.textpresso.org or via WormBase at http://www.wormbase.org. PMID:15383839
Jones, Andrew R.; Siepen, Jennifer A.; Hubbard, Simon J.; Paton, Norman W.
2010-01-01
Tandem mass spectrometry, run in combination with liquid chromatography (LC-MS/MS), can generate large numbers of peptide and protein identifications, for which a variety of database search engines are available. Distinguishing correct identifications from false positives is far from trivial because all data sets are noisy, and tend to be too large for manual inspection, therefore probabilistic methods must be employed to balance the trade-off between sensitivity and specificity. Decoy databases are becoming widely used to place statistical confidence in results sets, allowing the false discovery rate (FDR) to be estimated. It has previously been demonstrated that different MS search engines produce different peptide identification sets, and as such, employing more than one search engine could result in an increased number of peptides being identified. However, such efforts are hindered by the lack of a single scoring framework employed by all search engines. We have developed a search engine independent scoring framework based on FDR which allows peptide identifications from different search engines to be combined, called the FDRScore. We observe that peptide identifications made by three search engines are infrequently false positives, and identifications made by only a single search engine, even with a strong score from the source search engine, are significantly more likely to be false positives. We have developed a second score based on the FDR within peptide identifications grouped according to the set of search engines that have made the identification, called the combined FDRScore. We demonstrate by searching large publicly available data sets that the combined FDRScore can differentiate between between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine. PMID:19253293
Exploration of Web Users' Search Interests through Automatic Subject Categorization of Query Terms.
ERIC Educational Resources Information Center
Pu, Hsiao-tieh; Yang, Chyan; Chuang, Shui-Lung
2001-01-01
Proposes a mechanism that carefully integrates human and machine efforts to explore Web users' search interests. The approach consists of a four-step process: extraction of core terms; construction of subject taxonomy; automatic subject categorization of query terms; and observation of users' search interests. Research findings are proved valuable…
On search guide phrase compilation for recommending home medical products.
Luo, Gang
2010-01-01
To help people find desired home medical products (HMPs), we developed an intelligent personal health record (iPHR) system that can automatically recommend HMPs based on users' health issues. Using nursing knowledge, we pre-compile a set of "search guide" phrases that provides semantic translation from words describing health issues to their underlying medical meanings. Then iPHR automatically generates queries from those phrases and uses them and a search engine to retrieve HMPs. To avoid missing relevant HMPs during retrieval, the compiled search guide phrases need to be comprehensive. Such compilation is a challenging task because nursing knowledge updates frequently and contains numerous details scattered in many sources. This paper presents a semi-automatic tool facilitating such compilation. Our idea is to formulate the phrase compilation task as a multi-label classification problem. For each newly obtained search guide phrase, we first use nursing knowledge and information retrieval techniques to identify a small set of potentially relevant classes with corresponding hints. Then a nurse makes the final decision on assigning this phrase to proper classes based on those hints. We demonstrate the effectiveness of our techniques by compiling search guide phrases from an occupational therapy textbook.
Measuring the Interestingness of Articles in a Limited User Environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pon, R; Cardenas, A; Buttler, David
Search engines, such as Google, assign scores to news articles based on their relevance to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevance scores do not take into account what makes an article interesting, which would vary from user to user. Although methods such as collaborative filtering have been shown to be effective in recommendation systems, in a limited user environment, there are not enough users that would make collaborative filtering effective. A generalmore » framework, called iScore, is presented for defining and measuring the ‘‘interestingness of articles, incorporating user-feedback. iScore addresses the various aspects of what makes an article interesting, such as topic relevance, uniqueness, freshness, source reputation, and writing style. It employs various methods, such as multiple topic tracking, online parameter selection, language models, clustering, sentiment analysis, and phrase extraction to measure these features. Due to varying reasons that users hold about why an article is interesting, an online feature selection method in naι¨ve Bayes is also used to improve recommendation results. iScore can outperform traditional IR techniques by as much as 50.7%. iScore and its components are evaluated in the news recommendation task using three datasets from Yahoo! News, actual users, and Digg.« less
Improving visual search in instruction manuals using pictograms.
Kovačević, Dorotea; Brozović, Maja; Možina, Klementina
2016-11-01
Instruction manuals provide important messages about the proper use of a product. They should communicate in such a way that they facilitate users' searches for specific information. Despite the increasing research interest in visual search, there is a lack of empirical knowledge concerning the role of pictograms in search performance during the browsing of a manual's pages. This study investigates how the inclusion of pictograms improves the search for the target information. Furthermore, it examines whether this search process is influenced by the visual similarity between the pictograms and the searched for information. On the basis of eye-tracking measurements, as objective indicators of the participants' visual attention, it was found that pictograms can be a useful element of search strategy. Another interesting finding was that boldface highlighting is a more effective method for improving user experience in information seeking, rather than the similarity between the pictorial and adjacent textual information. Implications for designing effective user manuals are discussed. Practitioner Summary: Users often view instruction manuals with the aim of finding specific information. We used eye-tracking technology to examine different manual pages in order to improve the user's visual search for target information. The results indicate that the use of pictograms and bold highlighting of relevant information facilitate the search process.
ERIC Educational Resources Information Center
El Guemmat, Kamal; Ouahabi, Sara
2018-01-01
The objective of this article is to analyze the searching and indexing techniques of educational search engines' implementation while treating future challenges. Educational search engines could greatly help in the effectiveness of e-learning if used correctly. However, these engines have several gaps which influence the performance of e-learning…
Drexel at TREC 2014 Federated Web Search Track
2014-11-01
of its input RS results. 1. INTRODUCTION Federated Web Search is the task of searching multiple search engines simultaneously and combining their...or distributed properly[5]. The goal of RS is then, for a given query, to select only the most promising search engines from all those available. Most...result pages of 149 search engines . 4000 queries are used in building the sample set. As a part of the Vertical Selection task, search engines are
A Search Relevance Algorithm for Weather Effects Products
2006-12-29
accessed) are often search engines [4] [5]. This suggests that people are navigating the internet by searching and not through the traditional...geographic location. Unlike traditional search engines a Federated Search Engine does not scour all the data available and return matches. Instead...gold standard in search engines . However, its ranking system is based, largely, on a measure of interconnectedness. A page that is referenced more
Search Pathways: Modeling GeoData Search Behavior to Support Usable Application Development
NASA Astrophysics Data System (ADS)
Yarmey, L.; Rosati, A.; Tressel, S.
2014-12-01
Recent technical advances have enabled development of new scientific data discovery systems. Metadata brokering, linked data, and other mechanisms allow users to discover scientific data of interes across growing volumes of heterogeneous content. Matching this complex content with existing discovery technologies, people looking for scientific data are presented with an ever-growing array of features to sort, filter, subset, and scan through search returns to help them find what they are looking for. This paper examines the applicability of available technologies in connecting searchers with the data of interest. What metrics can be used to track success given shifting baselines of content and technology? How well do existing technologies map to steps in user search patterns? Taking a user-driven development approach, the team behind the Arctic Data Explorer interdisciplinary data discovery application invested heavily in usability testing and user search behavior analysis. Building on earlier library community search behavior work, models were developed to better define the diverse set of thought processes and steps users took to find data of interest, here called 'search pathways'. This research builds a deeper understanding of the user community that seeks to reuse scientific data. This approach ensures that development decisions are driven by clearly articulated user needs instead of ad hoc technology trends. Initial results from this research will be presented along with lessons learned for other discovery platform development and future directions for informatics research into search pathways.
Earthdata Search: How Usability Drives Innovation To Enable A Broad User Base
NASA Astrophysics Data System (ADS)
Reese, M.; Siarto, J.; Lynnes, C.; Shum, D.
2017-12-01
Earthdata Search (https://search.earthdata.nasa.gov) is a modern web application allowing users to search, discover, visualize, refine, and access NASA Earth Observation data using a wide array of service offerings. Its goal is to ease the technical burden on data users by providing a high-quality application that makes it simple to interact with NASA Earth observation data, freeing them to spend more effort on innovative endeavors. This talk would detail how we put end users first in our design and development process, focusing on usability and letting usability needs drive requirements for the underlying technology. Just a few examples of how this plays out practically, Earthdata Search teams with a lightning fast metadata repository, allowing it to be an extremely responsive UI that updates as the user changes criteria not only at the dataset level, but also at the file level. This results in a better exploration experience as the time penalty is greatly reduced. Also, since Earthdata Search uses metadata from over 35,000 datasets that are managed by different data providers, metadata standards, quality and consistency will vary. We found that this was negatively impacting users' search and exploration experience. We have resolved this problem with the introduction of "humanizers", which is a community-driven process to both "smooth out" metadata values and provide non-jargonistic representations of some content within the Earthdata Search UI. This is helpful for both the experience data scientist and our users that are brand new to the discipline.
McDowell, Graeme S. V.; Taylor, Graeme P.; Fai, Stephen; Bennett, Steffany A. L.
2014-01-01
The capacity to predict and visualize all theoretically possible glycerophospholipid molecular identities present in lipidomic datasets is currently limited. To address this issue, we expanded the search-engine and compositional databases of the online Visualization and Phospholipid Identification (VaLID) bioinformatic tool to include the glycerophosphoinositol superfamily. VaLID v1.0.0 originally allowed exact and average mass libraries of 736,584 individual species from eight phospholipid classes: glycerophosphates, glyceropyrophosphates, glycerophosphocholines, glycerophosphoethanolamines, glycerophosphoglycerols, glycerophosphoglycerophosphates, glycerophosphoserines, and cytidine 5′-diphosphate 1,2-diacyl-sn-glycerols to be searched for any mass to charge value (with adjustable tolerance levels) under a variety of mass spectrometry conditions. Here, we describe an update that now includes all possible glycerophosphoinositols, glycerophosphoinositol monophosphates, glycerophosphoinositol bisphosphates, and glycerophosphoinositol trisphosphates. This update expands the total number of lipid species represented in the VaLID v2.0.0 database to 1,473,168 phospholipids. Each phospholipid can be generated in skeletal representation. A subset of species curated by the Canadian Institutes of Health Research Training Program in Neurodegenerative Lipidomics (CTPNL) team is provided as an array of high-resolution structures. VaLID is freely available and responds to all users through the CTPNL resources web site. PMID:24701584
McDowell, Graeme S V; Blanchard, Alexandre P; Taylor, Graeme P; Figeys, Daniel; Fai, Stephen; Bennett, Steffany A L
2014-01-01
The capacity to predict and visualize all theoretically possible glycerophospholipid molecular identities present in lipidomic datasets is currently limited. To address this issue, we expanded the search-engine and compositional databases of the online Visualization and Phospholipid Identification (VaLID) bioinformatic tool to include the glycerophosphoinositol superfamily. VaLID v1.0.0 originally allowed exact and average mass libraries of 736,584 individual species from eight phospholipid classes: glycerophosphates, glyceropyrophosphates, glycerophosphocholines, glycerophosphoethanolamines, glycerophosphoglycerols, glycerophosphoglycerophosphates, glycerophosphoserines, and cytidine 5'-diphosphate 1,2-diacyl-sn-glycerols to be searched for any mass to charge value (with adjustable tolerance levels) under a variety of mass spectrometry conditions. Here, we describe an update that now includes all possible glycerophosphoinositols, glycerophosphoinositol monophosphates, glycerophosphoinositol bisphosphates, and glycerophosphoinositol trisphosphates. This update expands the total number of lipid species represented in the VaLID v2.0.0 database to 1,473,168 phospholipids. Each phospholipid can be generated in skeletal representation. A subset of species curated by the Canadian Institutes of Health Research Training Program in Neurodegenerative Lipidomics (CTPNL) team is provided as an array of high-resolution structures. VaLID is freely available and responds to all users through the CTPNL resources web site.
The Custom Search allows users to search for and generate customized data downloads of pollutant loadings information. Users can select varying levels of detail for outputs: annual, monitoring period, and facility level.
Integrated nuclear data utilisation system for innovative reactors.
Yamano, N; Hasegawa, A; Kato, K; Igashira, M
2005-01-01
A five-year research and development project on an integrated nuclear data utilisation system was initiated in 2002, for developing innovative nuclear energy systems such as accelerator-driven systems. The integrated nuclear data utilisation system will be constructed as a modular code system, which consists of two sub-systems: the nuclear data search and plotting sub-system, and the nuclear data processing and utilisation sub-system. The system will be operated with a graphical user interface in order to enable easy utilisation through the Internet by both nuclear design engineers and nuclear data evaluators. This paper presents an overview of the integrated nuclear data utilisation system, describes the development of a prototype system to examine the operability of the user interface and discusses specifications of the two sub-systems.
Liu, Yifeng; Liang, Yongjie; Wishart, David
2015-07-01
PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Liu, Yifeng; Liang, Yongjie; Wishart, David
2015-01-01
PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. PMID:25925572
A Detailed Analysis of End-User Search Behaviors.
ERIC Educational Resources Information Center
Wildemuth, Barbara M.; And Others
1991-01-01
Discussion of search strategy formulation focuses on a study at the University of North Carolina at Chapel Hill that analyzed how medical students developed and revised search strategies for microbiology database searches. Implications for future research on search behavior, for system interface design, and for end user training are suggested. (16…
Current Searching Methodology and Retrieval Issues: An Assessment
2008-03-01
searching that are used by search engines are discussed. They are: full text searching, i.e., the searching of unstructured data, and metadata searching...also found among search engines ; however, it is the popularity of full text searching that has changed the road map to information access. The...other hand, information seekers’ willingness, or lack of, to learn the multiple search engines ’ capabilities may diminish their search results
Analysis of Users' Searches of CD-ROM Databases in the National and University Library in Zagreb.
ERIC Educational Resources Information Center
Jokic, Maja
1997-01-01
Investigates the search behavior of CD-ROM database users in Zagreb (Croatia) libraries: one group needed a minimum of technical assistance, and the other was completely independent. Highlights include the use of questionnaires and transaction log analysis and the need for end-user education. The questionnaire and definitions of search process…
The Searching Behavior of Remote Users: A Study of One Online Public Access Catalog (OPAC).
ERIC Educational Resources Information Center
Kalin, Sally W.
1991-01-01
Describes a study that was conducted to determine whether the searching behavior of remote users of LIAS (Library Information Access System), Pennsylvania State University's online public access catalog (OPAC), differed from those using the OPAC within the library. Differences in search strategies and in user satisfaction are discussed. (eight…