database search engine: Topics by Science.gov

Sample records for database search engine

Adjacency and Proximity Searching in the Science Citation Index and Google

DTIC Science & Technology

2005-01-01

major database search engines , including commercial S&T database search engines (e.g., Science Citation Index (SCI), Engineering Compendex (EC...PubMed, OVID), Federal agency award database search engines (e.g., NSF, NIH, DOE, EPA, as accessed in Federal R&D Project Summaries), Web search Engines (e.g...searching. Some database search engines allow strict constrained co- occurrence searching as a user option (e.g., OVID, EC), while others do not (e.g., SCI
Database Search Engines: Paradigms, Challenges and Solutions.

PubMed

Verheggen, Kenneth; Martens, Lennart; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

2016-01-01

The first step in identifying proteins from mass spectrometry based shotgun proteomics data is to infer peptides from tandem mass spectra, a task generally achieved using database search engines. In this chapter, the basic principles of database search engines are introduced with a focus on open source software, and the use of database search engines is demonstrated using the freely available SearchGUI interface. This chapter also discusses how to tackle general issues related to sequence database searching and shows how to minimize their impact.
An approach in building a chemical compound search engine in oracle database.

PubMed

Wang, H; Volarath, P; Harrison, R

2005-01-01

A searching or identifying of chemical compounds is an important process in drug design and in chemistry research. An efficient search engine involves a close coupling of the search algorithm and database implementation. The database must process chemical structures, which demands the approaches to represent, store, and retrieve structures in a database system. In this paper, a general database framework for working as a chemical compound search engine in Oracle database is described. The framework is devoted to eliminate data type constrains for potential search algorithms, which is a crucial step toward building a domain specific query language on top of SQL. A search engine implementation based on the database framework is also demonstrated. The convenience of the implementation emphasizes the efficiency and simplicity of the framework.
BioCarian: search engine for exploratory searches in heterogeneous biological databases.

PubMed

Zaki, Nazar; Tennakoon, Chandana

2017-10-02

There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.
Comet: an open-source MS/MS sequence database search tool.

PubMed

Eng, Jimmy K; Jahan, Tahmina A; Hoopmann, Michael R

2013-01-01

Proteomics research routinely involves identifying peptides and proteins via MS/MS sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of commercial and open-source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows.

PubMed

Verheggen, Kenneth; Raeder, Helge; Berven, Frode S; Martens, Lennart; Barsnes, Harald; Vaudel, Marc

2017-09-13

Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines. © 2017 Wiley Periodicals, Inc.
Algorithms for database-dependent search of MS/MS data.

PubMed

Matthiesen, Rune

2013-01-01

The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.
A World Wide Web (WWW) server database engine for an organelle database, MitoDat.

PubMed

Lemkin, P F; Chipperfield, M; Merril, C; Zullo, S

1996-03-01

We describe a simple database search engine "dbEngine" which may be used to quickly create a searchable database on a World Wide Web (WWW) server. Data may be prepared from spreadsheet programs (such as Excel, etc.) or from tables exported from relationship database systems. This Common Gateway Interface (CGI-BIN) program is used with a WWW server such as available commercially, or from National Center for Supercomputer Algorithms (NCSA) or CERN. Its capabilities include: (i) searching records by combinations of terms connected with ANDs or ORs; (ii) returning search results as hypertext links to other WWW database servers; (iii) mapping lists of literature reference identifiers to the full references; (iv) creating bidirectional hypertext links between pictures and the database. DbEngine has been used to support the MitoDat database (Mendelian and non-Mendelian inheritance associated with the Mitochondrion) on the WWW.
Development of a One-Stop Data Search and Discovery Engine using Ontologies for Semantic Mappings (HydroSeek)

NASA Astrophysics Data System (ADS)

Piasecki, M.; Beran, B.

2007-12-01

Search engines have changed the way we see the Internet. The ability to find the information by just typing in keywords was a big contribution to the overall web experience. While the conventional search engine methodology worked well for textual documents, locating scientific data remains a problem since they are stored in databases not readily accessible by search engine bots. Considering different temporal, spatial and thematic coverage of different databases, especially for interdisciplinary research it is typically necessary to work with multiple data sources. These sources can be federal agencies which generally offer national coverage or regional sources which cover a smaller area with higher detail. However for a given geographic area of interest there often exists more than one database with relevant data. Thus being able to query multiple databases simultaneously is a desirable feature that would be tremendously useful for scientists. Development of such a search engine requires dealing with various heterogeneity issues. In scientific databases, systems often impose controlled vocabularies which ensure that they are generally homogeneous within themselves but are semantically heterogeneous when moving between different databases. This defines the boundaries of possible semantic related problems making it easier to solve than with the conventional search engines that deal with free text. We have developed a search engine that enables querying multiple data sources simultaneously and returns data in a standardized output despite the aforementioned heterogeneity issues between the underlying systems. This application relies mainly on metadata catalogs or indexing databases, ontologies and webservices with virtual globe and AJAX technologies for the graphical user interface. Users can trigger a search of dozens of different parameters over hundreds of thousands of stations from multiple agencies by providing a keyword, a spatial extent, i.e. a bounding box, and a temporal bracket. As part of this development we have also added an environment that allows users to do some of the semantic tagging, i.e. the linkage of a variable name (which can be anything they desire) to defined concepts in the ontology structure which in turn provides the backbone of the search engine.
Custom Search Engines: Tools & Tips

ERIC Educational Resources Information Center

Notess, Greg R.

2008-01-01

Few have the resources to build a Google or Yahoo! from scratch. Yet anyone can build a search engine based on a subset of the large search engines' databases. Use Google Custom Search Engine or Yahoo! Search Builder or any of the other similar programs to create a vertical search engine targeting sites of interest to users. The basic steps to…
Decision making in family medicine: randomized trial of the effects of the InfoClinique and Trip database search engines.

PubMed

Labrecque, Michel; Ratté, Stéphane; Frémont, Pierre; Cauchon, Michel; Ouellet, Jérôme; Hogg, William; McGowan, Jessie; Gagnon, Marie-Pierre; Njoya, Merlin; Légaré, France

2013-10-01

To compare the ability of users of 2 medical search engines, InfoClinique and the Trip database, to provide correct answers to clinical questions and to explore the perceived effects of the tools on the clinical decision-making process. Randomized trial. Three family medicine units of the family medicine program of the Faculty of Medicine at Laval University in Quebec city, Que. Fifteen second-year family medicine residents. Residents generated 30 structured questions about therapy or preventive treatment (2 questions per resident) based on clinical encounters. Using an Internet platform designed for the trial, each resident answered 20 of these questions (their own 2, plus 18 of the questions formulated by other residents, selected randomly) before and after searching for information with 1 of the 2 search engines. For each question, 5 residents were randomly assigned to begin their search with InfoClinique and 5 with the Trip database. The ability of residents to provide correct answers to clinical questions using the search engines, as determined by third-party evaluation. After answering each question, participants completed a questionnaire to assess their perception of the engine's effect on the decision-making process in clinical practice. Of 300 possible pairs of answers (1 answer before and 1 after the initial search), 254 (85%) were produced by 14 residents. Of these, 132 (52%) and 122 (48%) pairs of answers concerned questions that had been assigned an initial search with InfoClinique and the Trip database, respectively. Both engines produced an important and similar absolute increase in the proportion of correct answers after searching (26% to 62% for InfoClinique, for an increase of 36%; 24% to 63% for the Trip database, for an increase of 39%; P = .68). For all 30 clinical questions, at least 1 resident produced the correct answer after searching with either search engine. The mean (SD) time of the initial search for each question was 23.5 (7.6) minutes with InfoClinique and 22.3 (7.8) minutes with the Trip database (P = .30). Participants' perceptions of each engine's effect on the decision-making process were very positive and similar for both search engines. Family medicine residents' ability to provide correct answers to clinical questions increased dramatically and similarly with the use of both InfoClinique and the Trip database. These tools have strong potential to increase the quality of medical care.
The LAILAPS search engine: a feature model for relevance ranking in life science databases.

PubMed

Lange, Matthias; Spies, Karl; Colmsee, Christian; Flemming, Steffen; Klapperstück, Matthias; Scholz, Uwe

2010-03-25

Efficient and effective information retrieval in life sciences is one of the most pressing challenge in bioinformatics. The incredible growth of life science databases to a vast network of interconnected information systems is to the same extent a big challenge and a great chance for life science research. The knowledge found in the Web, in particular in life-science databases, are a valuable major resource. In order to bring it to the scientist desktop, it is essential to have well performing search engines. Thereby, not the response time nor the number of results is important. The most crucial factor for millions of query results is the relevance ranking. In this paper, we present a feature model for relevance ranking in life science databases and its implementation in the LAILAPS search engine. Motivated by the observation of user behavior during their inspection of search engine result, we condensed a set of 9 relevance discriminating features. These features are intuitively used by scientists, who briefly screen database entries for potential relevance. The features are both sufficient to estimate the potential relevance, and efficiently quantifiable. The derivation of a relevance prediction function that computes the relevance from this features constitutes a regression problem. To solve this problem, we used artificial neural networks that have been trained with a reference set of relevant database entries for 19 protein queries. Supporting a flexible text index and a simple data import format, this concepts are implemented in the LAILAPS search engine. It can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. LAILAPS is publicly available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.
DRUMS: a human disease related unique gene mutation search engine.

PubMed

Li, Zuofeng; Liu, Xingnan; Wen, Jingran; Xu, Ye; Zhao, Xin; Li, Xuan; Liu, Lei; Zhang, Xiaoyan

2011-10-01

With the completion of the human genome project and the development of new methods for gene variant detection, the integration of mutation data and its phenotypic consequences has become more important than ever. Among all available resources, locus-specific databases (LSDBs) curate one or more specific genes' mutation data along with high-quality phenotypes. Although some genotype-phenotype data from LSDB have been integrated into central databases little effort has been made to integrate all these data by a search engine approach. In this work, we have developed disease related unique gene mutation search engine (DRUMS), a search engine for human disease related unique gene mutation as a convenient tool for biologists or physicians to retrieve gene variant and related phenotype information. Gene variant and phenotype information were stored in a gene-centred relational database. Moreover, the relationships between mutations and diseases were indexed by the uniform resource identifier from LSDB, or another central database. By querying DRUMS, users can access the most popular mutation databases under one interface. DRUMS could be treated as a domain specific search engine. By using web crawling, indexing, and searching technologies, it provides a competitively efficient interface for searching and retrieving mutation data and their relationships to diseases. The present system is freely accessible at http://www.scbit.org/glif/new/drums/index.html. © 2011 Wiley-Liss, Inc.
Digging Deeper: The Deep Web.

ERIC Educational Resources Information Center

Turner, Laura

2001-01-01

Focuses on the Deep Web, defined as Web content in searchable databases of the type that can be found only by direct query. Discusses the problems of indexing; inability to find information not indexed in the search engine's database; and metasearch engines. Describes 10 sites created to access online databases or directly search them. Lists ways…
Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines.

PubMed

Jones, Andrew R; Siepen, Jennifer A; Hubbard, Simon J; Paton, Norman W

2009-03-01

LC-MS experiments can generate large quantities of data, for which a variety of database search engines are available to make peptide and protein identifications. Decoy databases are becoming widely used to place statistical confidence in result sets, allowing the false discovery rate (FDR) to be estimated. Different search engines produce different identification sets so employing more than one search engine could result in an increased number of peptides (and proteins) being identified, if an appropriate mechanism for combining data can be defined. We have developed a search engine independent score, based on FDR, which allows peptide identifications from different search engines to be combined, called the FDR Score. The results demonstrate that the observed FDR is significantly different when analysing the set of identifications made by all three search engines, by each pair of search engines or by a single search engine. Our algorithm assigns identifications to groups according to the set of search engines that have made the identification, and re-assigns the score (combined FDR Score). The combined FDR Score can differentiate between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine.
An ontology-based search engine for protein-protein interactions

PubMed Central

2010-01-01

Background Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. Results We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Conclusion Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology. PMID:20122195
An ontology-based search engine for protein-protein interactions.

PubMed

Park, Byungkyu; Han, Kyungsook

2010-01-18

Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology.
Using Internet search engines to estimate word frequency.

PubMed

Blair, Irene V; Urland, Geoffrey R; Ma, Jennifer E

2002-05-01

The present research investigated Internet search engines as a rapid, cost-effective alternative for estimating word frequencies. Frequency estimates for 382 words were obtained and compared across four methods: (1) Internet search engines, (2) the Kucera and Francis (1967) analysis of a traditional linguistic corpus, (3) the CELEX English linguistic database (Baayen, Piepenbrock, & Gulikers, 1995), and (4) participant ratings of familiarity. The results showed that Internet search engines produced frequency estimates that were highly consistent with those reported by Kucera and Francis and those calculated from CELEX, highly consistent across search engines, and very reliable over a 6-month period of time. Additional results suggested that Internet search engines are an excellent option when traditional word frequency analyses do not contain the necessary data (e.g., estimates for forenames and slang). In contrast, participants' familiarity judgments did not correspond well with the more objective estimates of word frequency. Researchers are advised to use search engines with large databases (e.g., AltaVista) to ensure the greatest representativeness of the frequency estimates.
Multi-source and ontology-based retrieval engine for maize mutant phenotypes

PubMed Central

Green, Jason M.; Harnsomburana, Jaturon; Schaeffer, Mary L.; Lawrence, Carolyn J.; Shyu, Chi-Ren

2011-01-01

Model Organism Databases, including the various plant genome databases, collect and enable access to massive amounts of heterogeneous information, including sequence data, gene product information, images of mutant phenotypes, etc, as well as textual descriptions of many of these entities. While a variety of basic browsing and search capabilities are available to allow researchers to query and peruse the names and attributes of phenotypic data, next-generation search mechanisms that allow querying and ranking of text descriptions are much less common. In addition, the plant community needs an innovative way to leverage the existing links in these databases to search groups of text descriptions simultaneously. Furthermore, though much time and effort have been afforded to the development of plant-related ontologies, the knowledge embedded in these ontologies remains largely unused in available plant search mechanisms. Addressing these issues, we have developed a unique search engine for mutant phenotypes from MaizeGDB. This advanced search mechanism integrates various text description sources in MaizeGDB to aid a user in retrieving desired mutant phenotype information. Currently, descriptions of mutant phenotypes, loci and gene products are utilized collectively for each search, though expansion of the search mechanism to include other sources is straightforward. The retrieval engine, to our knowledge, is the first engine to exploit the content and structure of available domain ontologies, currently the Plant and Gene Ontologies, to expand and enrich retrieval results in major plant genomic databases. Database URL: http:www.PhenomicsWorld.org/QBTA.php PMID:21558151
An Improved Forensic Science Information Search.

PubMed

Teitelbaum, J

2015-01-01

Although thousands of search engines and databases are available online, finding answers to specific forensic science questions can be a challenge even to experienced Internet users. Because there is no central repository for forensic science information, and because of the sheer number of disciplines under the forensic science umbrella, forensic scientists are often unable to locate material that is relevant to their needs. The author contends that using six publicly accessible search engines and databases can produce high-quality search results. The six resources are Google, PubMed, Google Scholar, Google Books, WorldCat, and the National Criminal Justice Reference Service. Carefully selected keywords and keyword combinations, designating a keyword phrase so that the search engine will search on the phrase and not individual keywords, and prompting search engines to retrieve PDF files are among the techniques discussed. Copyright © 2015 Central Police University.

Adaptation of Decoy Fusion Strategy for Existing Multi-Stage Search Workflows

NASA Astrophysics Data System (ADS)

Ivanov, Mark V.; Levitsky, Lev I.; Gorshkov, Mikhail V.

2016-09-01

A number of proteomic database search engines implement multi-stage strategies aiming at increasing the sensitivity of proteome analysis. These approaches often employ a subset of the original database for the secondary stage of analysis. However, if target-decoy approach (TDA) is used for false discovery rate (FDR) estimation, the multi-stage strategies may violate the underlying assumption of TDA that false matches are distributed uniformly across the target and decoy databases. This violation occurs if the numbers of target and decoy proteins selected for the second search are not equal. Here, we propose a method of decoy database generation based on the previously reported decoy fusion strategy. This method allows unbiased TDA-based FDR estimation in multi-stage searches and can be easily integrated into existing workflows utilizing popular search engines and post-search algorithms.
MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.

PubMed

Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I; Marcotte, Edward M

2011-07-01

Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.
MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines

PubMed Central

Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I.; Marcotte, Edward M.

2011-01-01

Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses. PMID:21488652
Ocean Drilling Program: Science Operator Search Engine

Science.gov Websites

and products Drilling services and tools Online Janus database Search the ODP/TAMU web site ODP's main -USIO site, plus IODP, ODP, and DSDP Publications, together or separately. ODP | Search | Database
Tags Extarction from Spatial Documents in Search Engines

NASA Astrophysics Data System (ADS)

Borhaninejad, S.; Hakimpour, F.; Hamzei, E.

2015-12-01

Nowadays the selective access to information on the Web is provided by search engines, but in the cases which the data includes spatial information the search task becomes more complex and search engines require special capabilities. The purpose of this study is to extract the information which lies in spatial documents. To that end, we implement and evaluate information extraction from GML documents and a retrieval method in an integrated approach. Our proposed system consists of three components: crawler, database and user interface. In crawler component, GML documents are discovered and their text is parsed for information extraction; storage. The database component is responsible for indexing of information which is collected by crawlers. Finally the user interface component provides the interaction between system and user. We have implemented this system as a pilot system on an Application Server as a simulation of Web. Our system as a spatial search engine provided searching capability throughout the GML documents and thus an important step to improve the efficiency of search engines has been taken.
NASA Image eXchange (NIX)

NASA Technical Reports Server (NTRS)

vonOfenheim. William H. C.; Heimerl, N. Lynn; Binkley, Robert L.; Curry, Marty A.; Slater, Richard T.; Nolan, Gerald J.; Griswold, T. Britt; Kovach, Robert D.; Corbin, Barney H.; Hewitt, Raymond W.

1998-01-01

This paper discusses the technical aspects of and the project background for the NASA Image exchange (NIX). NIX, which provides a single entry point to search selected image databases at the NASA Centers, is a meta-search engine (i.e., a search engine that communicates with other search engines). It uses these distributed digital image databases to access photographs, animations, and their associated descriptive information (meta-data). NIX is available for use at the following URL: http://nix.nasa.gov./NIX, which was sponsored by NASAs Scientific and Technical Information (STI) Program, currently serves images from seven NASA Centers. Plans are under way to link image databases from three additional NASA Centers. images and their associated meta-data, which are accessible by NIX, reside at the originating Centers, and NIX utilizes a virtual central site that communicates with each of these sites. Incorporated into the virtual central site are several protocols to support searches from a diverse collection of database engines. The searches are performed in parallel to ensure optimization of response times. To augment the search capability, browse functionality with pre-defined categories has been built into NIX, thereby ensuring dissemination of 'best-of-breed' imagery. As a final recourse, NIX offers access to a help desk via an on-line form to help locate images and information either within the scope of NIX or from available external sources.
An open-source, mobile-friendly search engine for public medical knowledge.

PubMed

Samwald, Matthias; Hanbury, Allan

2014-01-01

The World Wide Web has become an important source of information for medical practitioners. To complement the capabilities of currently available web search engines we developed FindMeEvidence, an open-source, mobile-friendly medical search engine. In a preliminary evaluation, the quality of results from FindMeEvidence proved to be competitive with those from TRIP Database, an established, closed-source search engine for evidence-based medicine.
Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines

PubMed Central

Jones, Andrew R.; Siepen, Jennifer A.; Hubbard, Simon J.; Paton, Norman W.

2010-01-01

Tandem mass spectrometry, run in combination with liquid chromatography (LC-MS/MS), can generate large numbers of peptide and protein identifications, for which a variety of database search engines are available. Distinguishing correct identifications from false positives is far from trivial because all data sets are noisy, and tend to be too large for manual inspection, therefore probabilistic methods must be employed to balance the trade-off between sensitivity and specificity. Decoy databases are becoming widely used to place statistical confidence in results sets, allowing the false discovery rate (FDR) to be estimated. It has previously been demonstrated that different MS search engines produce different peptide identification sets, and as such, employing more than one search engine could result in an increased number of peptides being identified. However, such efforts are hindered by the lack of a single scoring framework employed by all search engines. We have developed a search engine independent scoring framework based on FDR which allows peptide identifications from different search engines to be combined, called the FDRScore. We observe that peptide identifications made by three search engines are infrequently false positives, and identifications made by only a single search engine, even with a strong score from the source search engine, are significantly more likely to be false positives. We have developed a second score based on the FDR within peptide identifications grouped according to the set of search engines that have made the identification, called the combined FDRScore. We demonstrate by searching large publicly available data sets that the combined FDRScore can differentiate between between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine. PMID:19253293
Decision making in family medicine

PubMed Central

Labrecque, Michel; Ratté, Stéphane; Frémont, Pierre; Cauchon, Michel; Ouellet, Jérôme; Hogg, William; McGowan, Jessie; Gagnon, Marie-Pierre; Njoya, Merlin; Légaré, France

2013-01-01

Abstract Objective To compare the ability of users of 2 medical search engines, InfoClinique and the Trip database, to provide correct answers to clinical questions and to explore the perceived effects of the tools on the clinical decision-making process. Design Randomized trial. Setting Three family medicine units of the family medicine program of the Faculty of Medicine at Laval University in Quebec city, Que. Participants Fifteen second-year family medicine residents. Intervention Residents generated 30 structured questions about therapy or preventive treatment (2 questions per resident) based on clinical encounters. Using an Internet platform designed for the trial, each resident answered 20 of these questions (their own 2, plus 18 of the questions formulated by other residents, selected randomly) before and after searching for information with 1 of the 2 search engines. For each question, 5 residents were randomly assigned to begin their search with InfoClinique and 5 with the Trip database. Main outcome measures The ability of residents to provide correct answers to clinical questions using the search engines, as determined by third-party evaluation. After answering each question, participants completed a questionnaire to assess their perception of the engine’s effect on the decision-making process in clinical practice. Results Of 300 possible pairs of answers (1 answer before and 1 after the initial search), 254 (85%) were produced by 14 residents. Of these, 132 (52%) and 122 (48%) pairs of answers concerned questions that had been assigned an initial search with InfoClinique and the Trip database, respectively. Both engines produced an important and similar absolute increase in the proportion of correct answers after searching (26% to 62% for InfoClinique, for an increase of 36%; 24% to 63% for the Trip database, for an increase of 39%; P = .68). For all 30 clinical questions, at least 1 resident produced the correct answer after searching with either search engine. The mean (SD) time of the initial search for each question was 23.5 (7.6) minutes with InfoClinique and 22.3 (7.8) minutes with the Trip database (P = .30). Participants’ perceptions of each engine’s effect on the decision-making process were very positive and similar for both search engines. Conclusion Family medicine residents’ ability to provide correct answers to clinical questions increased dramatically and similarly with the use of both InfoClinique and the Trip database. These tools have strong potential to increase the quality of medical care. PMID:24130286
Agreement between Medline searches using the Medline-CD-Rom and Internet Pubmed, BioMedNet, Medscape and Gateway search-engines.

PubMed

Caro-Rojas, Rosa Angela; Eslava-Schmalbach, Javier H

2005-01-01

To compare the information obtained from the Medline database using Internet commercial search engines with that obtained from a compact disc (Medline-CD). An agreement study was carried out based on 101 clinical scenarios provided by specialists in internal medicine, pharmacy, gynaecology-obstetrics, surgery and paediatrics. 175 search strategies were employed using the connector AND plus text within quotation marks. The search was limited to 1991-1999. Internet search-engines were selected by common criteria. Identical search strategies were independently applied to and masked from Internet search engines, as well as the Medline-CD. 3,488 articles were obtained using 129 search strategies. Agreement with the Medline-CD was 54% for PubMed, 57% for Gateway, 54% for Medscape and 65% for BioMedNet. The highest agreement rate for a given speciality (paediatrics) was 78.1% for BioMedNet, having greater -/- than +/+ agreement. Even though free access to Medline has encouraged the boom and growth of evidence-based medicine, these results must be considered within the context of which search engine was selected for doing the searches. The Internet search engines studied showed a poor agreement with the Medline-CD, the rate of agreement differing according to speciality, thus significantly affecting searches and their reproducibility. Software designed for conducting Medline database searches, including the Medline-CD, must be standardised and validated.
Electronic Biomedical Literature Search for Budding Researcher

PubMed Central

Thakre, Subhash B.; Thakre S, Sushama S.; Thakre, Amol D.

2013-01-01

Search for specific and well defined literature related to subject of interest is the foremost step in research. When we are familiar with topic or subject then we can frame appropriate research question. Appropriate research question is the basis for study objectives and hypothesis. The Internet provides a quick access to an overabundance of the medical literature, in the form of primary, secondary and tertiary literature. It is accessible through journals, databases, dictionaries, textbooks, indexes, and e-journals, thereby allowing access to more varied, individualised, and systematic educational opportunities. Web search engine is a tool designed to search for information on the World Wide Web, which may be in the form of web pages, images, information, and other types of files. Search engines for internet-based search of medical literature include Google, Google scholar, Scirus, Yahoo search engine, etc., and databases include MEDLINE, PubMed, MEDLARS, etc. Several web-libraries (National library Medicine, Cochrane, Web of Science, Medical matrix, Emory libraries) have been developed as meta-sites, providing useful links to health resources globally. A researcher must keep in mind the strengths and limitations of a particular search engine/database while searching for a particular type of data. Knowledge about types of literature, levels of evidence, and detail about features of search engine as available, user interface, ease of access, reputable content, and period of time covered allow their optimal use and maximal utility in the field of medicine. Literature search is a dynamic and interactive process; there is no one way to conduct a search and there are many variables involved. It is suggested that a systematic search of literature that uses available electronic resource effectively, is more likely to produce quality research. PMID:24179937
Electronic biomedical literature search for budding researcher.

PubMed

Thakre, Subhash B; Thakre S, Sushama S; Thakre, Amol D

2013-09-01

Search for specific and well defined literature related to subject of interest is the foremost step in research. When we are familiar with topic or subject then we can frame appropriate research question. Appropriate research question is the basis for study objectives and hypothesis. The Internet provides a quick access to an overabundance of the medical literature, in the form of primary, secondary and tertiary literature. It is accessible through journals, databases, dictionaries, textbooks, indexes, and e-journals, thereby allowing access to more varied, individualised, and systematic educational opportunities. Web search engine is a tool designed to search for information on the World Wide Web, which may be in the form of web pages, images, information, and other types of files. Search engines for internet-based search of medical literature include Google, Google scholar, Scirus, Yahoo search engine, etc., and databases include MEDLINE, PubMed, MEDLARS, etc. Several web-libraries (National library Medicine, Cochrane, Web of Science, Medical matrix, Emory libraries) have been developed as meta-sites, providing useful links to health resources globally. A researcher must keep in mind the strengths and limitations of a particular search engine/database while searching for a particular type of data. Knowledge about types of literature, levels of evidence, and detail about features of search engine as available, user interface, ease of access, reputable content, and period of time covered allow their optimal use and maximal utility in the field of medicine. Literature search is a dynamic and interactive process; there is no one way to conduct a search and there are many variables involved. It is suggested that a systematic search of literature that uses available electronic resource effectively, is more likely to produce quality research.
Sagace: A web-based search engine for biomedical databases in Japan

PubMed Central

2012-01-01

Background In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database. Findings We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data) and biological resource banks (such as mouse models of disease and cell lines). With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry. Conclusions Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at http://sagace.nibio.go.jp/en/. PMID:23110816
OrChem - An open source chemistry search engine for Oracle(R).

PubMed

Rijnbeek, Mark; Steinbeck, Christoph

2009-10-22

Registration, indexing and searching of chemical structures in relational databases is one of the core areas of cheminformatics. However, little detail has been published on the inner workings of search engines and their development has been mostly closed-source. We decided to develop an open source chemistry extension for Oracle, the de facto database platform in the commercial world. Here we present OrChem, an extension for the Oracle 11G database that adds registration and indexing of chemical structures to support fast substructure and similarity searching. The cheminformatics functionality is provided by the Chemistry Development Kit. OrChem provides similarity searching with response times in the order of seconds for databases with millions of compounds, depending on a given similarity cut-off. For substructure searching, it can make use of multiple processor cores on today's powerful database servers to provide fast response times in equally large data sets. OrChem is free software and can be redistributed and/or modified under the terms of the GNU Lesser General Public License as published by the Free Software Foundation. All software is available via http://orchem.sourceforge.net.
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

PubMed Central

2012-01-01

Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. PMID:23216909
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework.

PubMed

Lewis, Steven; Csordas, Attila; Killcoyne, Sarah; Hermjakob, Henning; Hoopmann, Michael R; Moritz, Robert L; Deutsch, Eric W; Boyle, John

2012-12-05

For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
A Taxonomic Search Engine: Federating taxonomic databases using web services

PubMed Central

Page, Roderic DM

2005-01-01

Background The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. Results The Taxonomic Search Engine (TSE) is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO) and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID) authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata) for each name. Conclusion The Taxonomic Search Engine is available at and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names. PMID:15757517
EasyKSORD: A Platform of Keyword Search Over Relational Databases

NASA Astrophysics Data System (ADS)

Peng, Zhaohui; Li, Jing; Wang, Shan

Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.
MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for  the Construction of Sequence Data Warehouses.

PubMed

Gacesa, Ranko; Zucko, Jurica; Petursdottir, Solveig K; Gudmundsdottir, Elisabet Eik; Fridjonsson, Olafur H; Diminic, Janko; Long, Paul F; Cullum, John; Hranueli, Daslav; Hreggvidsson, Gudmundur O; Starcevic, Antonio

2017-06-01

The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya . The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.
A Full-Text-Based Search Engine for Finding Highly Matched Documents Across Multiple Categories

NASA Technical Reports Server (NTRS)

Nguyen, Hung D.; Steele, Gynelle C.

2016-01-01

This report demonstrates the full-text-based search engine that works on any Web-based mobile application. The engine has the capability to search databases across multiple categories based on a user's queries and identify the most relevant or similar. The search results presented here were found using an Android (Google Co.) mobile device; however, it is also compatible with other mobile phones.

Subject Specific Databases: A Powerful Research Tool

ERIC Educational Resources Information Center

Young, Terrence E., Jr.

2004-01-01

Subject specific databases, or vortals (vertical portals), are databases that provide highly detailed research information on a particular topic. They are the smallest, most focused search tools on the Internet and, in recent years, they've been on the rise. Currently, more of the so-called "mainstream" search engines, subject directories, and…
Multimedia explorer: image database, image proxy-server and search-engine.

PubMed Central

Frankewitsch, T.; Prokosch, U.

1999-01-01

Multimedia plays a major role in medicine. Databases containing images, movies or other types of multimedia objects are increasing in number, especially on the WWW. However, no good retrieval mechanism or search engine currently exists to efficiently track down such multimedia sources in the vast of information provided by the WWW. Secondly, the tools for searching databases are usually not adapted to the properties of images. HTML pages do not allow complex searches. Therefore establishing a more comfortable retrieval involves the use of a higher programming level like JAVA. With this platform independent language it is possible to create extensions to commonly used web browsers. These applets offer a graphical user interface for high level navigation. We implemented a database using JAVA objects as the primary storage container which are then stored by a JAVA controlled ORACLE8 database. Navigation depends on a structured vocabulary enhanced by a semantic network. With this approach multimedia objects can be encapsulated within a logical module for quick data retrieval. PMID:10566463
Multimedia explorer: image database, image proxy-server and search-engine.

PubMed

Frankewitsch, T; Prokosch, U

1999-01-01

Multimedia plays a major role in medicine. Databases containing images, movies or other types of multimedia objects are increasing in number, especially on the WWW. However, no good retrieval mechanism or search engine currently exists to efficiently track down such multimedia sources in the vast of information provided by the WWW. Secondly, the tools for searching databases are usually not adapted to the properties of images. HTML pages do not allow complex searches. Therefore establishing a more comfortable retrieval involves the use of a higher programming level like JAVA. With this platform independent language it is possible to create extensions to commonly used web browsers. These applets offer a graphical user interface for high level navigation. We implemented a database using JAVA objects as the primary storage container which are then stored by a JAVA controlled ORACLE8 database. Navigation depends on a structured vocabulary enhanced by a semantic network. With this approach multimedia objects can be encapsulated within a logical module for quick data retrieval.
Searches Conducted for Engineers.

ERIC Educational Resources Information Center

Lorenz, Patricia

This paper reports an industrial information specialist's experience in performing online searches for engineers and surveys the databases used. Engineers seeking assistance fall into three categories: (1) those who recognize the value of online retrieval; (2) referrals by colleagues; and (3) those who do not seek help. As more successful searches…
System for Performing Single Query Searches of Heterogeneous and Dispersed Databases

NASA Technical Reports Server (NTRS)

Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)

2017-01-01

The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.
OrChem - An open source chemistry search engine for Oracle®

PubMed Central

2009-01-01

Background Registration, indexing and searching of chemical structures in relational databases is one of the core areas of cheminformatics. However, little detail has been published on the inner workings of search engines and their development has been mostly closed-source. We decided to develop an open source chemistry extension for Oracle, the de facto database platform in the commercial world. Results Here we present OrChem, an extension for the Oracle 11G database that adds registration and indexing of chemical structures to support fast substructure and similarity searching. The cheminformatics functionality is provided by the Chemistry Development Kit. OrChem provides similarity searching with response times in the order of seconds for databases with millions of compounds, depending on a given similarity cut-off. For substructure searching, it can make use of multiple processor cores on today's powerful database servers to provide fast response times in equally large data sets. Availability OrChem is free software and can be redistributed and/or modified under the terms of the GNU Lesser General Public License as published by the Free Software Foundation. All software is available via http://orchem.sourceforge.net. PMID:20298521
DOE Office of Scientific and Technical Information (OSTI.GOV)

None Available

To make the web work better for science, OSTI has developed state-of-the-art technologies and services including a deep web search capability. The deep web includes content in searchable databases available to web users but not accessible by popular search engines, such as Google. This video provides an introduction to the deep web search engine.
A Search Engine Features Comparison.

ERIC Educational Resources Information Center

Vorndran, Gerald

Until recently, the World Wide Web (WWW) public access search engines have not included many of the advanced commands, options, and features commonly available with the for-profit online database user interfaces, such as DIALOG. This study evaluates the features and characteristics common to both types of search interfaces, examines the Web search…
Phenotip - a web-based instrument to help diagnosing fetal syndromes antenatally.

PubMed

Porat, Shay; de Rham, Maud; Giamboni, Davide; Van Mieghem, Tim; Baud, David

2014-12-10

Prenatal ultrasound can often reliably distinguish fetal anatomic anomalies, particularly in the hands of an experienced ultrasonographer. Given the large number of existing syndromes and the significant overlap in prenatal findings, antenatal differentiation for syndrome diagnosis is difficult. We constructed a hierarchic tree of 1140 sonographic markers and submarkers, organized per organ system. Subsequently, a database of prenatally diagnosable syndromes was built. An internet-based search engine was then designed to search the syndrome database based on a single or multiple sonographic markers. Future developments will include a database with magnetic resonance imaging findings as well as further refinements in the search engine to allow prioritization based on incidence of syndromes and markers.
IntegromeDB: an integrated system and biological search engine.

PubMed

Baitaluk, Michael; Kozhenkov, Sergey; Dubinina, Yulia; Ponomarenko, Julia

2012-01-19

With the growth of biological data in volume and heterogeneity, web search engines become key tools for researchers. However, general-purpose search engines are not specialized for the search of biological data. Here, we present an approach at developing a biological web search engine based on the Semantic Web technologies and demonstrate its implementation for retrieving gene- and protein-centered knowledge. The engine is available at http://www.integromedb.org. The IntegromeDB search engine allows scanning data on gene regulation, gene expression, protein-protein interactions, pathways, metagenomics, mutations, diseases, and other gene- and protein-related data that are automatically retrieved from publicly available databases and web pages using biological ontologies. To perfect the resource design and usability, we welcome and encourage community feedback.
Visibiome: an efficient microbiome search engine based on a scalable, distributed architecture.

PubMed

Azman, Syafiq Kamarul; Anwar, Muhammad Zohaib; Henschel, Andreas

2017-07-24

Given the current influx of 16S rRNA profiles of microbiota samples, it is conceivable that large amounts of them eventually are available for search, comparison and contextualization with respect to novel samples. This process facilitates the identification of similar compositional features in microbiota elsewhere and therefore can help to understand driving factors for microbial community assembly. We present Visibiome, a microbiome search engine that can perform exhaustive, phylogeny based similarity search and contextualization of user-provided samples against a comprehensive dataset of 16S rRNA profiles environments, while tackling several computational challenges. In order to scale to high demands, we developed a distributed system that combines web framework technology, task queueing and scheduling, cloud computing and a dedicated database server. To further ensure speed and efficiency, we have deployed Nearest Neighbor search algorithms, capable of sublinear searches in high-dimensional metric spaces in combination with an optimized Earth Mover Distance based implementation of weighted UniFrac. The search also incorporates pairwise (adaptive) rarefaction and optionally, 16S rRNA copy number correction. The result of a query microbiome sample is the contextualization against a comprehensive database of microbiome samples from a diverse range of environments, visualized through a rich set of interactive figures and diagrams, including barchart-based compositional comparisons and ranking of the closest matches in the database. Visibiome is a convenient, scalable and efficient framework to search microbiomes against a comprehensive database of environmental samples. The search engine leverages a popular but computationally expensive, phylogeny based distance metric, while providing numerous advantages over the current state of the art tool.
Deep Web video

ScienceCinema

None Available

2018-02-06

To make the web work better for science, OSTI has developed state-of-the-art technologies and services including a deep web search capability. The deep web includes content in searchable databases available to web users but not accessible by popular search engines, such as Google. This video provides an introduction to the deep web search engine.
Andromeda: a peptide search engine integrated into the MaxQuant environment.

PubMed

Cox, Jürgen; Neuhauser, Nadin; Michalski, Annette; Scheltema, Richard A; Olsen, Jesper V; Mann, Matthias

2011-04-01

A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides.
Identification of volatile and semivolatile compounds in chemical ionization GC-MS using a mass-to-structure (MTS) Search Engine with integral isotope pattern ranking.

PubMed

Liao, Wenta; Draper, William M

2013-02-21

The mass-to-structure or MTS Search Engine is an Access 2010 database containing theoretical molecular mass information for 19,438 compounds assembled from common sources such as the Merck Index, pesticide and pharmaceutical compilations, and chemical catalogues. This database, which contains no experimental mass spectral data, was developed as an aid to identification of compounds in atmospheric pressure ionization (API)-LC-MS. This paper describes a powerful upgrade to this database, a fully integrated utility for filtering or ranking candidates based on isotope ratios and patterns. The new MTS Search Engine is applied here to the identification of volatile and semivolatile compounds including pesticides, nitrosoamines and other pollutants. Methane and isobutane chemical ionization (CI) GC-MS spectra were obtained from unit mass resolution mass spectrometers to determine MH(+) masses and isotope ratios. Isotopes were measured accurately with errors of <4% and <6%, respectively, for A + 1 and A + 2 peaks. Deconvolution of interfering isotope clusters (e.g., M(+) and [M - H](+)) was required for accurate determination of the A + 1 isotope in halogenated compounds. Integrating the isotope data greatly improved the speed and accuracy of the database identifications. The database accurately identified unknowns from isobutane CI spectra in 100% of cases where as many as 40 candidates satisfied the mass tolerance. The paper describes the development and basic operation of the new MTS Search Engine and details performance testing with over 50 model compounds.
Practical and Efficient Searching in Proteomics: A Cross Engine Comparison

PubMed Central

Paulo, Joao A.

2014-01-01

Background Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses. Methods A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates. Results The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%. Conclusions The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort. PMID:25346847
Practical and Efficient Searching in Proteomics: A Cross Engine Comparison.

PubMed

Paulo, Joao A

2013-10-01

Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses. A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates. The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%. The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort.
IntegromeDB: an integrated system and biological search engine

PubMed Central

2012-01-01

Background With the growth of biological data in volume and heterogeneity, web search engines become key tools for researchers. However, general-purpose search engines are not specialized for the search of biological data. Description Here, we present an approach at developing a biological web search engine based on the Semantic Web technologies and demonstrate its implementation for retrieving gene- and protein-centered knowledge. The engine is available at http://www.integromedb.org. Conclusions The IntegromeDB search engine allows scanning data on gene regulation, gene expression, protein-protein interactions, pathways, metagenomics, mutations, diseases, and other gene- and protein-related data that are automatically retrieved from publicly available databases and web pages using biological ontologies. To perfect the resource design and usability, we welcome and encourage community feedback. PMID:22260095
A comparison of two search methods for determining the scope of systematic reviews and health technology assessments.

PubMed

Forsetlund, Louise; Kirkehei, Ingvild; Harboe, Ingrid; Odgaard-Jensen, Jan

2012-01-01

This study aims to compare two different search methods for determining the scope of a requested systematic review or health technology assessment. The first method (called the Direct Search Method) included performing direct searches in the Cochrane Database of Systematic Reviews (CDSR), Database of Abstracts of Reviews of Effects (DARE) and the Health Technology Assessments (HTA). Using the comparison method (called the NHS Search Engine) we performed searches by means of the search engine of the British National Health Service, NHS Evidence. We used an adapted cross-over design with a random allocation of fifty-five requests for systematic reviews. The main analyses were based on repeated measurements adjusted for the order in which the searches were conducted. The Direct Search Method generated on average fewer hits (48 percent [95 percent confidence interval {CI} 6 percent to 72 percent], had a higher precision (0.22 [95 percent CI, 0.13 to 0.30]) and more unique hits than when searching by means of the NHS Search Engine (50 percent [95 percent CI, 7 percent to 110 percent]). On the other hand, the Direct Search Method took longer (14.58 minutes [95 percent CI, 7.20 to 21.97]) and was perceived as somewhat less user-friendly than the NHS Search Engine (-0.60 [95 percent CI, -1.11 to -0.09]). Although the Direct Search Method had some drawbacks such as being more time-consuming and less user-friendly, it generated more unique hits than the NHS Search Engine, retrieved on average fewer references and fewer irrelevant results.
The MAO NASU Plate Archive Database. Current Status and Perspectives

NASA Astrophysics Data System (ADS)

Pakuliak, L. K.; Sergeeva, T. P.

2006-04-01

The preliminary online version of the database of the MAO NASU plate archive is constructed on the basis of the relational database management system MySQL and permits an easy supplement of database with new collections of astronegatives, provides a high flexibility in constructing SQL-queries for data search optimization, PHP Basic Authorization protected access to administrative interface and wide range of search parameters. The current status of the database will be reported and the brief description of the search engine and means of the database integrity support will be given. Methods and means of the data verification and tasks for the further development will be discussed.
Finding Business Information on the "Invisible Web": Search Utilities vs. Conventional Search Engines.

ERIC Educational Resources Information Center

Darrah, Brenda

Researchers for small businesses, which may have no access to expensive databases or market research reports, must often rely on information found on the Internet, which can be difficult to find. Although current conventional Internet search engines are now able to index over on billion documents, there are many more documents existing in…

Reprint of "pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data".

PubMed

Chi, Hao; He, Kun; Yang, Bing; Chen, Zhen; Sun, Rui-Xiang; Fan, Sheng-Bo; Zhang, Kun; Liu, Chao; Yuan, Zuo-Fei; Wang, Quan-Hui; Liu, Si-Qi; Dong, Meng-Qiu; He, Si-Min

2015-11-03

Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.
More Databases Searched by a Business Generalist--Part 2: A Veritable Cornucopia of Sources.

ERIC Educational Resources Information Center

Meredith, Meri

1986-01-01

This second installment describes databases irregularly searched in the Business Information Center, Cummins Engine Company (Columbus, Indiana). Highlights include typical research topics (happenings among similar manufacturers); government topics (Department of Defense contracts); market and industry topics; corporate intelligence; and personnel,…
Web-based Electronic Sharing and RE-allocation of Assets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leverett, Dave; Miller, Robert A.; Berlin, Gary J.

2002-09-09

The Electronic Asses Sharing Program is a web-based application that provides the capability for complex-wide sharing and reallocation of assets that are excess, under utilized, or un-utilized. through a web-based fron-end and supporting has database with a search engine, users can search for assets that they need, search for assets needed by others, enter assets they need, and enter assets they have available for reallocation. In addition, entire listings of available assets and needed assets can be viewed. The application is written in Java, the hash database and search engine are in Object-oriented Java Database Management (OJDBM). The application willmore » be hosted on an SRS-managed server outside the Firewall and access will be controlled via a protected realm. An example of the application can be viewed at the followinig (temporary) URL: http://idgdev.srs.gov/servlet/srs.weshare.WeShare« less
Search strategies on the Internet: general and specific.

PubMed

Bottrill, Krys

2004-06-01

Some of the most up-to-date information on scientific activity is to be found on the Internet; for example, on the websites of academic and other research institutions and in databases of currently funded research studies provided on the websites of funding bodies. Such information can be valuable in suggesting new approaches and techniques that could be applicable in a Three Rs context. However, the Internet is a chaotic medium, not subject to the meticulous classification and organisation of classical information resources. At the same time, Internet search engines do not match the sophistication of search systems used by database hosts. Also, although some offer relatively advanced features, user awareness of these tends to be low. Furthermore, much of the information on the Internet is not accessible to conventional search engines, giving rise to the concept of the "Invisible Web". General strategies and techniques for Internet searching are presented, together with a comparative survey of selected search engines. The question of how the Invisible Web can be accessed is discussed, as well as how to keep up-to-date with Internet content and improve searching skills.
GEMINI: a computationally-efficient search engine for large gene expression datasets.

PubMed

DeFreitas, Timothy; Saddiki, Hachem; Flaherty, Patrick

2016-02-24

Low-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and use that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data using a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query - a text-based string - is mismatched with the form of the target - a genomic profile. To improve access to massive genomic data resources, we have developed a fast search engine, GEMINI, that uses a genomic profile as a query to search for similar genomic profiles. GEMINI implements a nearest-neighbor search algorithm using a vantage-point tree to store a database of n profiles and in certain circumstances achieves an [Formula: see text] expected query time in the limit. We tested GEMINI on breast and ovarian cancer gene expression data from The Cancer Genome Atlas project and show that it achieves a query time that scales as the logarithm of the number of records in practice on genomic data. In a database with 10(5) samples, GEMINI identifies the nearest neighbor in 0.05 sec compared to a brute force search time of 0.6 sec. GEMINI is a fast search engine that uses a query genomic profile to search for similar profiles in a very large genomic database. It enables users to identify similar profiles independent of sample label, data origin or other meta-data information.
Cloud parallel processing of tandem mass spectrometry based proteomics data.

PubMed

Mohammed, Yassene; Mostovenko, Ekaterina; Henneman, Alex A; Marissen, Rob J; Deelder, André M; Palmblad, Magnus

2012-10-05

Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.
Noesis: Ontology based Scoped Search Engine and Resource Aggregator for Atmospheric Science

NASA Astrophysics Data System (ADS)

Ramachandran, R.; Movva, S.; Li, X.; Cherukuri, P.; Graves, S.

2006-12-01

The goal for search engines is to return results that are both accurate and complete. The search engines should find only what you really want and find everything you really want. Search engines (even meta search engines) lack semantics. The basis for search is simply based on string matching between the user's query term and the resource database and the semantics associated with the search string is not captured. For example, if an atmospheric scientist is searching for "pressure" related web resources, most search engines return inaccurate results such as web resources related to blood pressure. In this presentation Noesis, which is a meta-search engine and a resource aggregator that uses domain ontologies to provide scoped search capabilities will be described. Noesis uses domain ontologies to help the user scope the search query to ensure that the search results are both accurate and complete. The domain ontologies guide the user to refine their search query and thereby reduce the user's burden of experimenting with different search strings. Semantics are captured by refining the query terms to cover synonyms, specializations, generalizations and related concepts. Noesis also serves as a resource aggregator. It categorizes the search results from different online resources such as education materials, publications, datasets, web search engines that might be of interest to the user.
Astronomical databases of Nikolaev Observatory

NASA Astrophysics Data System (ADS)

Protsyuk, Y.; Mazhaev, A.

2008-07-01

Several astronomical databases were created at Nikolaev Observatory during the last years. The databases are built by using MySQL search engine and PHP scripts. They are available on NAO web-site http://www.mao.nikolaev.ua.
The Brazilian Portuguese Lexicon: An Instrument for Psycholinguistic Research

PubMed Central

Estivalet, Gustavo L.; Meunier, Fanny

2015-01-01

In this article, we present the Brazilian Portuguese Lexicon, a new word-based corpus for psycholinguistic and computational linguistic research in Brazilian Portuguese. We describe the corpus development, the specific characteristics on the internet site and database for user access. We also perform distributional analyses of the corpus and comparisons to other current databases. Our main objective was to provide a large, reliable, and useful word-based corpus with a dynamic, easy-to-use, and intuitive interface with free internet access for word and word-criteria searches. We used the Núcleo Interinstitucional de Linguística Computacional’s corpus as the basic data source and developed the Brazilian Portuguese Lexicon by deriving and adding metalinguistic and psycholinguistic information about Brazilian Portuguese words. We obtained a final corpus with more than 30 million word tokens, 215 thousand word types and 25 categories of information about each word. This corpus was made available on the internet via a free-access site with two search engines: a simple search and a complex search. The simple engine basically searches for a list of words, while the complex engine accepts all types of criteria in the corpus categories. The output result presents all entries found in the corpus with the criteria specified in the input search and can be downloaded as a.csv file. We created a module in the results that delivers basic statistics about each search. The Brazilian Portuguese Lexicon also provides a pseudoword engine and specific tools for linguistic and statistical analysis. Therefore, the Brazilian Portuguese Lexicon is a convenient instrument for stimulus search, selection, control, and manipulation in psycholinguistic experiments, as also it is a powerful database for computational linguistics research and language modeling related to lexicon distribution, functioning, and behavior. PMID:26630138
The Brazilian Portuguese Lexicon: An Instrument for Psycholinguistic Research.

PubMed

Estivalet, Gustavo L; Meunier, Fanny

2015-01-01

In this article, we present the Brazilian Portuguese Lexicon, a new word-based corpus for psycholinguistic and computational linguistic research in Brazilian Portuguese. We describe the corpus development, the specific characteristics on the internet site and database for user access. We also perform distributional analyses of the corpus and comparisons to other current databases. Our main objective was to provide a large, reliable, and useful word-based corpus with a dynamic, easy-to-use, and intuitive interface with free internet access for word and word-criteria searches. We used the Núcleo Interinstitucional de Linguística Computacional's corpus as the basic data source and developed the Brazilian Portuguese Lexicon by deriving and adding metalinguistic and psycholinguistic information about Brazilian Portuguese words. We obtained a final corpus with more than 30 million word tokens, 215 thousand word types and 25 categories of information about each word. This corpus was made available on the internet via a free-access site with two search engines: a simple search and a complex search. The simple engine basically searches for a list of words, while the complex engine accepts all types of criteria in the corpus categories. The output result presents all entries found in the corpus with the criteria specified in the input search and can be downloaded as a.csv file. We created a module in the results that delivers basic statistics about each search. The Brazilian Portuguese Lexicon also provides a pseudoword engine and specific tools for linguistic and statistical analysis. Therefore, the Brazilian Portuguese Lexicon is a convenient instrument for stimulus search, selection, control, and manipulation in psycholinguistic experiments, as also it is a powerful database for computational linguistics research and language modeling related to lexicon distribution, functioning, and behavior.
Evaluating Open-Source Full-Text Search Engines for Matching ICD-10 Codes.

PubMed

Jurcău, Daniel-Alexandru; Stoicu-Tivadar, Vasile

2016-01-01

This research presents the results of evaluating multiple free, open-source engines on matching ICD-10 diagnostic codes via full-text searches. The study investigates what it takes to get an accurate match when searching for a specific diagnostic code. For each code the evaluation starts by extracting the words that make up its text and continues with building full-text search queries from the combinations of these words. The queries are then run against all the ICD-10 codes until a match indicates the code in question as a match with the highest relative score. This method identifies the minimum number of words that must be provided in order for the search engines choose the desired entry. The engines analyzed include a popular Java-based full-text search engine, a lightweight engine written in JavaScript which can even execute on the user's browser, and two popular open-source relational database management systems.
Using the Turning Research Into Practice (TRIP) database: how do clinicians really search?*

PubMed Central

Meats, Emma; Brassey, Jon; Heneghan, Carl; Glasziou, Paul

2007-01-01

Objectives: Clinicians and patients are increasingly accessing information through Internet searches. This study aimed to examine clinicians' current search behavior when using the Turning Research Into Practice (TRIP) database to examine search engine use and the ways it might be improved. Methods: A Web log analysis was undertaken of the TRIP database—a meta-search engine covering 150 health resources including MEDLINE, The Cochrane Library, and a variety of guidelines. The connectors for terms used in searches were studied, and observations were made of 9 users' search behavior when working with the TRIP database. Results: Of 620,735 searches, most used a single term, and 12% (n = 75,947) used a Boolean operator: 11% (n = 69,006) used “AND” and 0.8% (n = 4,941) used “OR.” Of the elements of a well-structured clinical question (population, intervention, comparator, and outcome), the population was most commonly used, while fewer searches included the intervention. Comparator and outcome were rarely used. Participants in the observational study were interested in learning how to formulate better searches. Conclusions: Web log analysis showed most searches used a single term and no Boolean operators. Observational study revealed users were interested in conducting efficient searches but did not always know how. Therefore, either better training or better search interfaces are required to assist users and enable more effective searching. PMID:17443248
Automatic sorting of toxicological information into the IUCLID (International Uniform Chemical Information Database) endpoint-categories making use of the semantic search engine Go3R.

PubMed

Sauer, Ursula G; Wächter, Thomas; Hareng, Lars; Wareing, Britta; Langsch, Angelika; Zschunke, Matthias; Alvers, Michael R; Landsiedel, Robert

2014-06-01

The knowledge-based search engine Go3R, www.Go3R.org, has been developed to assist scientists from industry and regulatory authorities in collecting comprehensive toxicological information with a special focus on identifying available alternatives to animal testing. The semantic search paradigm of Go3R makes use of expert knowledge on 3Rs methods and regulatory toxicology, laid down in the ontology, a network of concepts, terms, and synonyms, to recognize the contents of documents. Search results are automatically sorted into a dynamic table of contents presented alongside the list of documents retrieved. This table of contents allows the user to quickly filter the set of documents by topics of interest. Documents containing hazard information are automatically assigned to a user interface following the endpoint-specific IUCLID5 categorization scheme required, e.g. for REACH registration dossiers. For this purpose, complex endpoint-specific search queries were compiled and integrated into the search engine (based upon a gold standard of 310 references that had been assigned manually to the different endpoint categories). Go3R sorts 87% of the references concordantly into the respective IUCLID5 categories. Currently, Go3R searches in the 22 million documents available in the PubMed and TOXNET databases. However, it can be customized to search in other databases including in-house databanks. Copyright © 2013 Elsevier Ltd. All rights reserved.
Multitasking Information Seeking and Searching Processes.

ERIC Educational Resources Information Center

Spink, Amanda; Ozmutlu, H. Cenk; Ozmutlu, Seda

2002-01-01

Presents findings from four studies of the prevalence of multitasking information seeking and searching by Web (via the Excite search engine), information retrieval system (mediated online database searching), and academic library users. Highlights include human information coordinating behavior (HICB); and implications for models of information…
New Capabilities in the Astrophysics Multispectral Archive Search Engine

NASA Astrophysics Data System (ADS)

Cheung, C. Y.; Kelley, S.; Roussopoulos, N.

The Astrophysics Multispectral Archive Search Engine (AMASE) uses object-oriented database techniques to provide a uniform multi-mission and multi-spectral interface to search for data in the distributed archives. We describe our experience of porting AMASE from Illustra object-relational DBMS to the Informix Universal Data Server. New capabilities and utilities have been developed, including a spatial datablade that supports Nearest Neighbor queries.
Querying archetype-based EHRs by search ontology-based XPath engineering.

PubMed

Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich

2018-05-11

Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.
HBVPathDB: a database of HBV infection-related molecular interaction network.

PubMed

Zhang, Yi; Bo, Xiao-Chen; Yang, Jing; Wang, Sheng-Qi

2005-03-21

To describe molecules or genes interaction between hepatitis B viruses (HBV) and host, for understanding how virus' and host's genes and molecules are networked to form a biological system and for perceiving mechanism of HBV infection. The knowledge of HBV infection-related reactions was organized into various kinds of pathways with carefully drawn graphs in HBVPathDB. Pathway information is stored with relational database management system (DBMS), which is currently the most efficient way to manage large amounts of data and query is implemented with powerful Structured Query Language (SQL). The search engine is written using Personal Home Page (PHP) with SQL embedded and web retrieval interface is developed for searching with Hypertext Markup Language (HTML). We present the first version of HBVPathDB, which is a HBV infection-related molecular interaction network database composed of 306 pathways with 1 050 molecules involved. With carefully drawn graphs, pathway information stored in HBVPathDB can be browsed in an intuitive way. We develop an easy-to-use interface for flexible accesses to the details of database. Convenient software is implemented to query and browse the pathway information of HBVPathDB. Four search page layout options-category search, gene search, description search, unitized search-are supported by the search engine of the database. The database is freely available at http://www.bio-inf.net/HBVPathDB/HBV/. The conventional perspective HBVPathDB have already contained a considerable amount of pathway information with HBV infection related, which is suitable for in-depth analysis of molecular interaction network of virus and host. HBVPathDB integrates pathway data-sets with convenient software for query, browsing, visualization, that provides users more opportunity to identify regulatory key molecules as potential drug targets and to explore the possible mechanism of HBV infection based on gene expression datasets.
A comprehensive and scalable database search system for metaproteomics.

PubMed

Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W

2016-08-16

Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.
Social Work Literature Searching: Current Issues with Databases and Online Search Engines

ERIC Educational Resources Information Center

McGinn, Tony; Taylor, Brian; McColgan, Mary; McQuilkan, Janice

2016-01-01

Objectives: To compare the performance of a range of search facilities; and to illustrate the execution of a comprehensive literature search for qualitative evidence in social work. Context: Developments in literature search methods and comparisons of search facilities help facilitate access to the best available evidence for social workers.…
Evaluation of Proteomic Search Engines for the Analysis of Histone Modifications

PubMed Central

2015-01-01

Identification of histone post-translational modifications (PTMs) is challenging for proteomics search engines. Including many histone PTMs in one search increases the number of candidate peptides dramatically, leading to low search speed and fewer identified spectra. To evaluate database search engines on identifying histone PTMs, we present a method in which one kind of modification is searched each time, for example, unmodified, individually modified, and multimodified, each search result is filtered with false discovery rate less than 1%, and the identifications of multiple search engines are combined to obtain confident results. We apply this method for eight search engines on histone data sets. We find that two search engines, pFind and Mascot, identify most of the confident results at a reasonable speed, so we recommend using them to identify histone modifications. During the evaluation, we also find some important aspects for the analysis of histone modifications. Our evaluation of different search engines on identifying histone modifications will hopefully help those who are hoping to enter the histone proteomics field. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the data set identifier PXD001118. PMID:25167464

Evaluation of proteomic search engines for the analysis of histone modifications.

PubMed

Yuan, Zuo-Fei; Lin, Shu; Molden, Rosalynn C; Garcia, Benjamin A

2014-10-03

Identification of histone post-translational modifications (PTMs) is challenging for proteomics search engines. Including many histone PTMs in one search increases the number of candidate peptides dramatically, leading to low search speed and fewer identified spectra. To evaluate database search engines on identifying histone PTMs, we present a method in which one kind of modification is searched each time, for example, unmodified, individually modified, and multimodified, each search result is filtered with false discovery rate less than 1%, and the identifications of multiple search engines are combined to obtain confident results. We apply this method for eight search engines on histone data sets. We find that two search engines, pFind and Mascot, identify most of the confident results at a reasonable speed, so we recommend using them to identify histone modifications. During the evaluation, we also find some important aspects for the analysis of histone modifications. Our evaluation of different search engines on identifying histone modifications will hopefully help those who are hoping to enter the histone proteomics field. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the data set identifier PXD001118.
Complementary Value of Databases for Discovery of Scholarly Literature: A User Survey of Online Searching for Publications in Art History

ERIC Educational Resources Information Center

Nemeth, Erik

2010-01-01

Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…
100 Colleges Sign Up with Google to Speed Access to Library Resources

ERIC Educational Resources Information Center

Young, Jeffrey R.

2005-01-01

More than 100 colleges and universities have arranged to give people using the Google Scholar search engine on their campuses more-direct access to library materials. Google Scholar is a free tool that searches scholarly materials on the Web and in academic databases. The new arrangements essentially let Google know which online databases the…
Search Engines for Tomorrow's Scholars

ERIC Educational Resources Information Center

Fagan, Jody Condit

2011-01-01

Today's scholars face an outstanding array of choices when choosing search tools: Google Scholar, discipline-specific abstracts and index databases, library discovery tools, and more recently, Microsoft's re-launch of their academic search tool, now dubbed Microsoft Academic Search. What are these tools' strengths for the emerging needs of…
(Meta)Search like Google

ERIC Educational Resources Information Center

Rochkind, Jonathan

2007-01-01

The ability to search and receive results in more than one database through a single interface--or metasearch--is something many users want. Google Scholar--the search engine of specifically scholarly content--and library metasearch products like Ex Libris's MetaLib, Serials Solution's Central Search, WebFeat, and products based on MuseGlobal used…
GenderMedDB: an interactive database of sex and gender-specific medical literature.

PubMed

Oertelt-Prigione, Sabine; Gohlke, Björn-Oliver; Dunkel, Mathias; Preissner, Robert; Regitz-Zagrosek, Vera

2014-01-01

Searches for sex and gender-specific publications are complicated by the absence of a specific algorithm within search engines and by the lack of adequate archives to collect the retrieved results. We previously addressed this issue by initiating the first systematic archive of medical literature containing sex and/or gender-specific analyses. This initial collection has now been greatly enlarged and re-organized as a free user-friendly database with multiple functions: GenderMedDB (http://gendermeddb.charite.de). GenderMedDB retrieves the included publications from the PubMed database. Manuscripts containing sex and/or gender-specific analysis are continuously screened and the relevant findings organized systematically into disciplines and diseases. Publications are furthermore classified by research type, subject and participant numbers. More than 11,000 abstracts are currently included in the database, after screening more than 40,000 publications. The main functions of the database include searches by publication data or content analysis based on pre-defined classifications. In addition, registrants are enabled to upload relevant publications, access descriptive publication statistics and interact in an open user forum. Overall, GenderMedDB offers the advantages of a discipline-specific search engine as well as the functions of a participative tool for the gender medicine community.
P185-M Protein Identification and Validation of Results in Workflows that Integrate over Various Instruments, Datasets, Search Engines

PubMed Central

Hufnagel, P.; Glandorf, J.; Körting, G.; Jabs, W.; Schweiger-Hufnagel, U.; Hahner, S.; Lubeck, M.; Suckau, D.

2007-01-01

Analysis of complex proteomes often results in long protein lists, but falls short in measuring the validity of identification and quantification results on a greater number of proteins. Biological and technical replicates are mandatory, as is the combination of the MS data from various workflows (gels, 1D-LC, 2D-LC), instruments (TOF/TOF, trap, qTOF or FTMS), and search engines. We describe a database-driven study that combines two workflows, two mass spectrometers, and four search engines with protein identification following a decoy database strategy. The sample was a tryptically digested lysate (10,000 cells) of a human colorectal cancer cell line. Data from two LC-MALDI-TOF/TOF runs and a 2D-LC-ESI-trap run using capillary and nano-LC columns were submitted to the proteomics software platform ProteinScape. The combined MALDI data and the ESI data were searched using Mascot (Matrix Science), Phenyx (GeneBio), ProteinSolver (Bruker and Protagen), and Sequest (Thermo) against a decoy database generated from IPI-human in order to obtain one protein list across all workflows and search engines at a defined maximum false-positive rate of 5%. ProteinScape combined the data to one LC-MALDI and one LC-ESI dataset. The initial separate searches from the two combined datasets generated eight independent peptide lists. These were compiled into an integrated protein list using the ProteinExtractor algorithm. An initial evaluation of the generated data led to the identification of approximately 1200 proteins. Result integration on a peptide level allowed discrimination of protein isoforms that would not have been possible with a mere combination of protein lists.
An Overview of the Literature: Research in P-12 Engineering Education

ERIC Educational Resources Information Center

Mendoza Díaz, Noemi V.; Cox, Monica F.

2012-01-01

This paper presents an extensive overview of preschool to 12th grade (P-12) engineering education literature published between 2001 and 2011. Searches were conducted through education and engineering library engines and databases as well as queries in established publications in engineering education. More than 50 publications were found,…
Probabilistic consensus scoring improves tandem mass spectrometry peptide identification.

PubMed

Nahnsen, Sven; Bertsch, Andreas; Rahnenführer, Jörg; Nordheim, Alfred; Kohlbacher, Oliver

2011-08-05

Database search is a standard technique for identifying peptides from their tandem mass spectra. To increase the number of correctly identified peptides, we suggest a probabilistic framework that allows the combination of scores from different search engines into a joint consensus score. Central to the approach is a novel method to estimate scores for peptides not found by an individual search engine. This approach allows the estimation of p-values for each candidate peptide and their combination across all search engines. The consensus approach works better than any single search engine across all different instrument types considered in this study. Improvements vary strongly from platform to platform and from search engine to search engine. Compared to the industry standard MASCOT, our approach can identify up to 60% more peptides. The software for consensus predictions is implemented in C++ as part of OpenMS, a software framework for mass spectrometry. The source code is available in the current development version of OpenMS and can easily be used as a command line application or via a graphical pipeline designer TOPPAS.
Detection of alternative splice variants at the proteome level in Aspergillus flavus.

PubMed

Chang, Kung-Yen; Georgianna, D Ryan; Heber, Steffen; Payne, Gary A; Muddiman, David C

2010-03-05

Identification of proteins from proteolytic peptides or intact proteins plays an essential role in proteomics. Researchers use search engines to match the acquired peptide sequences to the target proteins. However, search engines depend on protein databases to provide candidates for consideration. Alternative splicing (AS), the mechanism where the exon of pre-mRNAs can be spliced and rearranged to generate distinct mRNA and therefore protein variants, enable higher eukaryotic organisms, with only a limited number of genes, to have the requisite complexity and diversity at the proteome level. Multiple alternative isoforms from one gene often share common segments of sequences. However, many protein databases only include a limited number of isoforms to keep minimal redundancy. As a result, the database search might not identify a target protein even with high quality tandem MS data and accurate intact precursor ion mass. We computationally predicted an exhaustive list of putative isoforms of Aspergillus flavus proteins from 20 371 expressed sequence tags to investigate whether an alternative splicing protein database can assign a greater proportion of mass spectrometry data. The newly constructed AS database provided 9807 new alternatively spliced variants in addition to 12 832 previously annotated proteins. The searches of the existing tandem MS spectra data set using the AS database identified 29 new proteins encoded by 26 genes. Nine fungal genes appeared to have multiple protein isoforms. In addition to the discovery of splice variants, AS database also showed potential to improve genome annotation. In summary, the introduction of an alternative splicing database helps identify more proteins and unveils more information about a proteome.
Real-time earthquake monitoring using a search engine method.

PubMed

Zhang, Jie; Zhang, Haijiang; Chen, Enhong; Zheng, Yi; Kuang, Wenhuan; Zhang, Xiong

2014-12-04

When an earthquake occurs, seismologists want to use recorded seismograms to infer its location, magnitude and source-focal mechanism as quickly as possible. If such information could be determined immediately, timely evacuations and emergency actions could be undertaken to mitigate earthquake damage. Current advanced methods can report the initial location and magnitude of an earthquake within a few seconds, but estimating the source-focal mechanism may require minutes to hours. Here we present an earthquake search engine, similar to a web search engine, that we developed by applying a computer fast search method to a large seismogram database to find waveforms that best fit the input data. Our method is several thousand times faster than an exact search. For an Mw 5.9 earthquake on 8 March 2012 in Xinjiang, China, the search engine can infer the earthquake's parameters in <1 s after receiving the long-period surface wave data.
Real-time earthquake monitoring using a search engine method

PubMed Central

Zhang, Jie; Zhang, Haijiang; Chen, Enhong; Zheng, Yi; Kuang, Wenhuan; Zhang, Xiong

2014-01-01

When an earthquake occurs, seismologists want to use recorded seismograms to infer its location, magnitude and source-focal mechanism as quickly as possible. If such information could be determined immediately, timely evacuations and emergency actions could be undertaken to mitigate earthquake damage. Current advanced methods can report the initial location and magnitude of an earthquake within a few seconds, but estimating the source-focal mechanism may require minutes to hours. Here we present an earthquake search engine, similar to a web search engine, that we developed by applying a computer fast search method to a large seismogram database to find waveforms that best fit the input data. Our method is several thousand times faster than an exact search. For an Mw 5.9 earthquake on 8 March 2012 in Xinjiang, China, the search engine can infer the earthquake’s parameters in <1 s after receiving the long-period surface wave data. PMID:25472861
Electronic Reference Library: Silverplatter's Database Networking Solution.

ERIC Educational Resources Information Center

Millea, Megan

Silverplatter's Electronic Reference Library (ERL) provides wide area network access to its databases using TCP/IP communications and client-server architecture. ERL has two main components: The ERL clients (retrieval interface) and the ERL server (search engines). ERL clients provide patrons with seamless access to multiple databases on multiple…
[International bibliographic databases--Current Contents on disk and in FTP format (Internet): presentation and guide].

PubMed

Bloch-Mouillet, E

1999-01-01

This paper aims to provide technical and practical advice about finding references using Current Contents on disk (Macintosh or PC) or via the Internet (FTP). Seven editions are published each week. They are all organized in the same way and have the same search engine. The Life Sciences edition, extensively used in medical research, is presented here in detail, as an example. This methodological note explains, in French, how to use this reference database. It is designed to be a practical guide for browsing and searching the database, and particularly for creating search profiles adapted to the needs of researchers.
STEM Education Related Dissertation Abstracts: A Bounded Qualitative Meta-Study

ERIC Educational Resources Information Center

Banning, James; Folkestad, James E.

2012-01-01

This article utilizes a bounded qualitative meta-study framework to examine the 101 dissertation abstracts found by searching the ProQuest Dissertation and Theses[TM] digital database for dissertations abstracts from 1990 through 2010 using the search terms education, science, technology, engineer, and STEM/SMET. Professional search librarians…
Global search tool for the Advanced Photon Source Integrated Relational Model of Installed Systems (IRMIS) database.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Quock, D. E. R.; Cianciarulo, M. B.; APS Engineering Support Division

2007-01-01

The Integrated Relational Model of Installed Systems (IRMIS) is a relational database tool that has been implemented at the Advanced Photon Source to maintain an updated account of approximately 600 control system software applications, 400,000 process variables, and 30,000 control system hardware components. To effectively display this large amount of control system information to operators and engineers, IRMIS was initially built with nine Web-based viewers: Applications Organizing Index, IOC, PLC, Component Type, Installed Components, Network, Controls Spares, Process Variables, and Cables. However, since each viewer is designed to provide details from only one major category of the control system, themore » necessity for a one-stop global search tool for the entire database became apparent. The user requirements for extremely fast database search time and ease of navigation through search results led to the choice of Asynchronous JavaScript and XML (AJAX) technology in the implementation of the IRMIS global search tool. Unique features of the global search tool include a two-tier level of displayed search results, and a database data integrity validation and reporting mechanism.« less
MICA: desktop software for comprehensive searching of DNA databases

PubMed Central

Stokes, William A; Glick, Benjamin S

2006-01-01

Background Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. Results MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. Conclusion MICA is suitable as a search engine for desktop DNA analysis software. PMID:17018144
PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface.

PubMed

Uszkoreit, Julian; Maerkens, Alexandra; Perez-Riverol, Yasset; Meyer, Helmut E; Marcus, Katrin; Stephan, Christian; Kohlbacher, Oliver; Eisenacher, Martin

2015-07-02

Protein inference connects the peptide spectrum matches (PSMs) obtained from database search engines back to proteins, which are typically at the heart of most proteomics studies. Different search engines yield different PSMs and thus different protein lists. Analysis of results from one or multiple search engines is often hampered by different data exchange formats and lack of convenient and intuitive user interfaces. We present PIA, a flexible software suite for combining PSMs from different search engine runs and turning these into consistent results. PIA can be integrated into proteomics data analysis workflows in several ways. A user-friendly graphical user interface can be run either locally or (e.g., for larger core facilities) from a central server. For automated data processing, stand-alone tools are available. PIA implements several established protein inference algorithms and can combine results from different search engines seamlessly. On several benchmark data sets, we show that PIA can identify a larger number of proteins at the same protein FDR when compared to that using inference based on a single search engine. PIA supports the majority of established search engines and data in the mzIdentML standard format. It is implemented in Java and freely available at https://github.com/mpc-bioinformatics/pia.
[Scientometrics and bibliometrics of biomedical engineering periodicals and papers].

PubMed

Zhao, Ping; Xu, Ping; Li, Bingyan; Wang, Zhengrong

2003-09-01

This investigation was made to reveal the current status, research trend and research level of biomedical engineering in Chinese mainland by means of scientometrics and to assess the quality of the four domestic publications by bibliometrics. We identified all articles of four related publications by searching Chinese and foreign databases from 1997 to 2001. All articles collected or cited by these databases were searched and statistically analyzed for finding out the relevant distributions, including databases, years, authors, institutions, subject headings and subheadings. The source of sustentation funds and the related articles were analyzed too. The results showed that two journals were cited by two foreign databases and five Chinese databases simultaneously. The output of Journal of Biomedical Engineering was the highest. Its quantity of original papers cited by EI, CA and the totality of papers sponsored by funds were higher than those of the others, but the quantity and percentage per year of biomedical articles cited by EI were decreased in all. Inland core authors and institutions had come into being in the field of biomedical engineering. Their research topics were mainly concentrated on ten subject headings which included biocompatible materials, computer-assisted signal processing, electrocardiography, computer-assisted image processing, biomechanics, algorithms, electroencephalography, automatic data processing, mechanical stress, hemodynamics, mathematical computing, microcomputers, theoretical models, etc. The main subheadings were concentrated on instrumentation, physiopathology, diagnosis, therapy, ultrasonography, physiology, analysis, surgery, pathology, method, etc.
Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

PubMed

Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

2016-01-01

Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

Privacy Perspectives for Online Searchers: Confidentiality with Confidence?

ERIC Educational Resources Information Center

Duberman, Josh; Beaudet, Michael

2000-01-01

Presents issues and questions involved in online privacy from the information professional's perspective. Topics include consumer concerns; query confidentiality; securing computers from intrusion; electronic mail; search engines; patents and intellectual property searches; government's role; Internet service providers; database mining; user…
Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment

PubMed Central

Dasari, Surendra; Chambers, Matthew C.; Martinez, Misti A.; Carpenter, Kristin L.; Ham, Amy-Joan L.; Vega-Montoto, Lorenzo J.; Tabb, David L.

2012-01-01

Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines. PMID:22217208
LymPHOS 2.0: an update of a phosphosite database of primary human T cells

PubMed Central

Nguyen, Tien Dung; Vidal-Cortes, Oriol; Gallardo, Oscar; Abian, Joaquin; Carrascal, Montserrat

2015-01-01

LymPHOS is a web-oriented database containing peptide and protein sequences and spectrometric information on the phosphoproteome of primary human T-Lymphocytes. Current release 2.0 contains 15 566 phosphorylation sites from 8273 unique phosphopeptides and 4937 proteins, which correspond to a 45-fold increase over the original database description. It now includes quantitative data on phosphorylation changes after time-dependent treatment with activators of the TCR-mediated signal transduction pathway. Sequence data quality has also been improved with the use of multiple search engines for database searching. LymPHOS can be publicly accessed at http://www.lymphos.org. Database URL: http://www.lymphos.org. PMID:26708986
The Fragment Network: A Chemistry Recommendation Engine Built Using a Graph Database.

PubMed

Hall, Richard J; Murray, Christopher W; Verdonk, Marcel L

2017-07-27

The hit validation stage of a fragment-based drug discovery campaign involves probing the SAR around one or more fragment hits. This often requires a search for similar compounds in a corporate collection or from commercial suppliers. The Fragment Network is a graph database that allows a user to efficiently search chemical space around a compound of interest. The result set is chemically intuitive, naturally grouped by substitution pattern and meaningfully sorted according to the number of observations of each transformation in medicinal chemistry databases. This paper describes the algorithms used to construct and search the Fragment Network and provides examples of how it may be used in a drug discovery context.
RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures

PubMed Central

2010-01-01

Background Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB) in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. Description RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics) is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA structures is provided. RNA FRABASE 2.0 is freely available at http://rnafrabase.cs.put.poznan.pl. Conclusions RNA FRABASE 2.0 provides a novel database and powerful search engine which is equipped with new data and functionalities that are unavailable elsewhere. Our intention is that this advanced version of the RNA FRABASE will be of interest to all researchers working in the RNA field. PMID:20459631
RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures.

PubMed

Popenda, Mariusz; Szachniuk, Marta; Blazewicz, Marek; Wasik, Szymon; Burke, Edmund K; Blazewicz, Jacek; Adamiak, Ryszard W

2010-05-06

Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB) in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics) is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA structures is provided. RNA FRABASE 2.0 is freely available at http://rnafrabase.cs.put.poznan.pl. RNA FRABASE 2.0 provides a novel database and powerful search engine which is equipped with new data and functionalities that are unavailable elsewhere. Our intention is that this advanced version of the RNA FRABASE will be of interest to all researchers working in the RNA field.
Evaluation of an open source tool for indexing and searching enterprise radiology and pathology reports

NASA Astrophysics Data System (ADS)

Kim, Woojin; Boonn, William

2010-03-01

Data mining of existing radiology and pathology reports within an enterprise health system can be used for clinical decision support, research, education, as well as operational analyses. In our health system, the database of radiology and pathology reports exceeds 13 million entries combined. We are building a web-based tool to allow search and data analysis of these combined databases using freely available and open source tools. This presentation will compare performance of an open source full-text indexing tool to MySQL's full-text indexing and searching and describe implementation procedures to incorporate these capabilities into a radiology-pathology search engine.
A collection of open source applications for mass spectrometry data mining.

PubMed

Gallardo, Óscar; Ovelleiro, David; Gay, Marina; Carrascal, Montserrat; Abian, Joaquin

2014-10-01

We present several bioinformatics applications for the identification and quantification of phosphoproteome components by MS. These applications include a front-end graphical user interface that combines several Thermo RAW formats to MASCOT™ Generic Format extractors (EasierMgf), two graphical user interfaces for search engines OMSSA and SEQUEST (OmssaGui and SequestGui), and three applications, one for the management of databases in FASTA format (FastaTools), another for the integration of search results from up to three search engines (Integrator), and another one for the visualization of mass spectra and their corresponding database search results (JsonVisor). These applications were developed to solve some of the common problems found in proteomic and phosphoproteomic data analysis and were integrated in the workflow for data processing and feeding on our LymPHOS database. Applications were designed modularly and can be used standalone. These tools are written in Perl and Python programming languages and are supported on Windows platforms. They are all released under an Open Source Software license and can be freely downloaded from our software repository hosted at GoogleCode. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
MassSieve: Panning MS/MS peptide data for proteins

PubMed Central

Slotta, Douglas J.; McFarland, Melinda A.; Markey, Sanford P.

2010-01-01

We present MassSieve, a Java-based platform for visualization and parsimony analysis of single and comparative LC-MS/MS database search engine results. The success of mass spectrometric peptide sequence assignment algorithms has led to the need for a tool to merge and evaluate the increasing data set sizes that result from LC-MS/MS-based shotgun proteomic experiments. MassSieve supports reports from multiple search engines with differing search characteristics, which can increase peptide sequence coverage and/or identify conflicting or ambiguous spectral assignments. PMID:20564260
The Wannabee Culture: Why No-One Does What They Used To.

ERIC Educational Resources Information Center

Dixon, Anne

1998-01-01

Electronic publishing has been an agent for change in not just how one publishes but in what one publishes. Describes HyperCite, a joint project with the Institution of Electrical Engineers (IEE) to create INSPEC database. Highlights include the database; the research phase (cross database searching and new interface); and what and how much was…
Impact of Commercial Search Engines and International Databases on Engineering Teaching and Research

ERIC Educational Resources Information Center

Chanson, Hubert

2007-01-01

For the last three decades, the engineering higher education and professional environments have been completely transformed by the "electronic/digital information revolution" that has included the introduction of personal computer, the development of email and world wide web, and broadband Internet connections at home. Herein the writer compares…
Automating Information Discovery Within the Invisible Web

NASA Astrophysics Data System (ADS)

Sweeney, Edwina; Curran, Kevin; Xie, Ermai

A Web crawler or spider crawls through the Web looking for pages to index, and when it locates a new page it passes the page on to an indexer. The indexer identifies links, keywords, and other content and stores these within its database. This database is searched by entering keywords through an interface and suitable Web pages are returned in a results page in the form of hyperlinks accompanied by short descriptions. The Web, however, is increasingly moving away from being a collection of documents to a multidimensional repository for sounds, images, audio, and other formats. This is leading to a situation where certain parts of the Web are invisible or hidden. The term known as the "Deep Web" has emerged to refer to the mass of information that can be accessed via the Web but cannot be indexed by conventional search engines. The concept of the Deep Web makes searches quite complex for search engines. Google states that the claim that conventional search engines cannot find such documents as PDFs, Word, PowerPoint, Excel, or any non-HTML page is not fully accurate and steps have been taken to address this problem by implementing procedures to search items such as academic publications, news, blogs, videos, books, and real-time information. However, Google still only provides access to a fraction of the Deep Web. This chapter explores the Deep Web and the current tools available in accessing it.
Make Mine a Metasearcher, Please!

ERIC Educational Resources Information Center

Repman, Judi; Carlson, Randal D.

2000-01-01

Describes metasearch tools and explains their value in helping library media centers improve students' Web searches. Discusses Boolean queries and the emphasis on speed at the expense of comprehensiveness; and compares four metasearch tools, including the number of search engines consulted, user control, and databases included. (LRW)
Knowledge-based personalized search engine for the Web-based Human Musculoskeletal System Resources (HMSR) in biomechanics.

PubMed

Dao, Tien Tuan; Hoang, Tuan Nha; Ta, Xuan Hien; Tho, Marie Christine Ho Ba

2013-02-01

Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical processes, pathological knowledge and practical expertise. In this present work, an advanced knowledge-based personalized search engine was developed. Our search engine was based on a client-server multi-layer multi-agent architecture and the principle of semantic web services to acquire dynamically accurate and reliable HMSR information by a semantic processing and visualization approach. A security-enhanced mechanism was applied to protect the medical information. A multi-agent crawler was implemented to develop a content-based database of HMSR information. A new semantic-based PageRank score with related mathematical formulas were also defined and implemented. As the results, semantic web service descriptions were presented in OWL, WSDL and OWL-S formats. Operational scenarios with related web-based interfaces for personal computers and mobile devices were presented and analyzed. Functional comparison between our knowledge-based search engine, a conventional search engine and a semantic search engine showed the originality and the robustness of our knowledge-based personalized search engine. In fact, our knowledge-based personalized search engine allows different users such as orthopedic patient and experts or healthcare system managers or medical students to access remotely into useful, accurate, reliable and good-quality HMSR information for their learning and medical purposes. Copyright © 2012 Elsevier Inc. All rights reserved.
Novel LOVD databases for hereditary breast cancer and colorectal cancer genes in the Chinese population.

PubMed

Pan, Min; Cong, Peikuan; Wang, Yue; Lin, Changsong; Yuan, Ying; Dong, Jian; Banerjee, Santasree; Zhang, Tao; Chen, Yanling; Zhang, Ting; Chen, Mingqing; Hu, Peter; Zheng, Shu; Zhang, Jin; Qi, Ming

2011-12-01

The Human Variome Project (HVP) is an international consortium of clinicians, geneticists, and researchers from over 30 countries, aiming to facilitate the establishment and maintenance of standards, systems, and infrastructure for the worldwide collection and sharing of all genetic variations effecting human disease. The HVP-China Node will build new and supplement existing databases of genetic diseases. As the first effort, we have created a novel variant database of BRCA1 and BRCA2, mismatch repair genes (MMR), and APC genes for breast cancer, Lynch syndrome, and familial adenomatous polyposis (FAP), respectively, in the Chinese population using the Leiden Open Variation Database (LOVD) format. We searched PubMed and some Chinese search engines to collect all the variants of these genes in the Chinese population that have already been detected and reported. There are some differences in the gene variants between the Chinese population and that of other ethnicities. The database is available online at http://www.genomed.org/LOVD/. Our database will appear to users who survey other LOVD databases (e.g., by Google search, or by NCBI GeneTests search). Remote submissions are accepted, and the information is updated monthly. © 2011 Wiley Periodicals, Inc.
Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences.

PubMed

Stephens, Susie M; Chen, Jake Y; Davidson, Marcel G; Thomas, Shiby; Trute, Barry M

2005-01-01

As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html.
Modification Site Localization in Peptides.

PubMed

Chalkley, Robert J

2016-01-01

There are a large number of search engines designed to take mass spectrometry fragmentation spectra and match them to peptides from proteins in a database. These peptides could be unmodified, but they could also bear modifications that were added biologically or during sample preparation. As a measure of reliability for the peptide identification, software normally calculates how likely a given quality of match could have been achieved at random, most commonly through the use of target-decoy database searching (Elias and Gygi, Nat Methods 4(3): 207-214, 2007). Matching the correct peptide but with the wrong modification localization is not a random match, so results with this error will normally still be assessed as reliable identifications by the search engine. Hence, an extra step is required to determine site localization reliability, and the software approaches to measure this are the subject of this part of the chapter.
PubMed searches: overview and strategies for clinicians.

PubMed

Lindsey, Wesley T; Olin, Bernie R

2013-04-01

PubMed is a biomedical and life sciences database maintained by a division of the National Library of Medicine known as the National Center for Biotechnology Information (NCBI). It is a large resource with more than 5600 journals indexed and greater than 22 million total citations. Searches conducted in PubMed provide references that are more specific for the intended topic compared with other popular search engines. Effective PubMed searches allow the clinician to remain current on the latest clinical trials, systematic reviews, and practice guidelines. PubMed continues to evolve by allowing users to create a customized experience through the My NCBI portal, new arrangements and options in search filters, and supporting scholarly projects through exportation of citations to reference managing software. Prepackaged search options available in the Clinical Queries feature also allow users to efficiently search for clinical literature. PubMed also provides information regarding the source journals themselves through the Journals in NCBI Databases link. This article provides an overview of the PubMed database's structure and features as well as strategies for conducting an effective search.
Knowledge Based Engineering for Spatial Database Management and Use

NASA Technical Reports Server (NTRS)

Peuquet, D. (Principal Investigator)

1984-01-01

The use of artificial intelligence techniques that are applicable to Geographic Information Systems (GIS) are examined. Questions involving the performance and modification to the database structure, the definition of spectra in quadtree structures and their use in search heuristics, extension of the knowledge base, and learning algorithm concepts are investigated.
Nuclear science abstracts (NSA) database 1948--1974 (on the Internet)

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

Nuclear Science Abstracts (NSA) is a comprehensive abstract and index collection of the International Nuclear Science and Technology literature for the period 1948 through 1976. Included are scientific and technical reports of the US Atomic Energy Commission, US Energy Research and Development Administration and its contractors, other agencies, universities, and industrial and research organizations. Coverage of the literature since 1976 is provided by Energy Science and Technology Database. Approximately 25% of the records in the file contain abstracts. These are from the following volumes of the print Nuclear Science Abstracts: Volumes 12--18, Volume 29, and Volume 33. The database containsmore » over 900,000 bibliographic records. All aspects of nuclear science and technology are covered, including: Biomedical Sciences; Metals, Ceramics, and Other Materials; Chemistry; Nuclear Materials and Waste Management; Environmental and Earth Sciences; Particle Accelerators; Engineering; Physics; Fusion Energy; Radiation Effects; Instrumentation; Reactor Technology; Isotope and Radiation Source Technology. The database includes all records contained in Volume 1 (1948) through Volume 33 (1976) of the printed version of Nuclear Science Abstracts (NSA). This worldwide coverage includes books, conference proceedings, papers, patents, dissertations, engineering drawings, and journal literature. This database is now available for searching through the GOV. Research Center (GRC) service. GRC is a single online web-based search service to well known Government databases. Featuring powerful search and retrieval software, GRC is an important research tool. The GRC web site is at http://grc.ntis.gov.« less

OntoMate: a text-mining tool aiding curation at the Rat Genome Database

PubMed Central

Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

2015-01-01

The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558
LIVIVO - the Vertical Search Engine for Life Sciences.

PubMed

Müller, Bernd; Poley, Christoph; Pössel, Jana; Hagelstein, Alexandra; Gübitz, Thomas

2017-01-01

The explosive growth of literature and data in the life sciences challenges researchers to keep track of current advancements in their disciplines. Novel approaches in the life science like the One Health paradigm require integrated methodologies in order to link and connect heterogeneous information from databases and literature resources. Current publications in the life sciences are increasingly characterized by the employment of trans-disciplinary methodologies comprising molecular and cell biology, genetics, genomic, epigenomic, transcriptional and proteomic high throughput technologies with data from humans, plants, and animals. The literature search engine LIVIVO empowers retrieval functionality by incorporating various literature resources from medicine, health, environment, agriculture and nutrition. LIVIVO is developed in-house by ZB MED - Information Centre for Life Sciences. It provides a user-friendly and usability-tested search interface with a corpus of 55 Million citations derived from 50 databases. Standardized application programming interfaces are available for data export and high throughput retrieval. The search functions allow for semantic retrieval with filtering options based on life science entities. The service oriented architecture of LIVIVO uses four different implementation layers to deliver search services. A Knowledge Environment is developed by ZB MED to deal with the heterogeneity of data as an integrative approach to model, store, and link semantic concepts within literature resources and databases. Future work will focus on the exploitation of life science ontologies and on the employment of NLP technologies in order to improve query expansion, filters in faceted search, and concept based relevancy rankings in LIVIVO.
Determination of geographic variance in stroke prevalence using Internet search engine analytics.

PubMed

Walcott, Brian P; Nahed, Brian V; Kahle, Kristopher T; Redjal, Navid; Coumans, Jean-Valery

2011-06-01

Previous methods to determine stroke prevalence, such as nationwide surveys, are labor-intensive endeavors. Recent advances in search engine query analytics have led to a new metric for disease surveillance to evaluate symptomatic phenomenon, such as influenza. The authors hypothesized that the use of search engine query data can determine the prevalence of stroke. The Google Insights for Search database was accessed to analyze anonymized search engine query data. The authors' search strategy utilized common search queries used when attempting either to identify the signs and symptoms of a stroke or to perform stroke education. The search logic was as follows: (stroke signs + stroke symptoms + mini stroke--heat) from January 1, 2005, to December 31, 2010. The relative number of searches performed (the interest level) for this search logic was established for all 50 states and the District of Columbia. A Pearson product-moment correlation coefficient was calculated from the statespecific stroke prevalence data previously reported. Web search engine interest level was available for all 50 states and the District of Columbia over the time period for January 1, 2005-December 31, 2010. The interest level was highest in Alabama and Tennessee (100 and 96, respectively) and lowest in California and Virginia (58 and 53, respectively). The Pearson correlation coefficient (r) was calculated to be 0.47 (p = 0.0005, 2-tailed). Search engine query data analysis allows for the determination of relative stroke prevalence. Further investigation will reveal the reliability of this metric to determine temporal pattern analysis and prevalence in this and other symptomatic diseases.
ExplorEnz: a MySQL database of the IUBMB enzyme nomenclature.

PubMed

McDonald, Andrew G; Boyce, Sinéad; Moss, Gerard P; Dixon, Henry B F; Tipton, Keith F

2007-07-27

We describe the database ExplorEnz, which is the primary repository for EC numbers and enzyme data that are being curated on behalf of the IUBMB. The enzyme nomenclature is incorporated into many other resources, including the ExPASy-ENZYME, BRENDA and KEGG bioinformatics databases. The data, which are stored in a MySQL database, preserve the formatting of chemical and enzyme names. A simple, easy to use, web-based query interface is provided, along with an advanced search engine for more complex queries. The database is publicly available at http://www.enzyme-database.org. The data are available for download as SQL and XML files via FTP. ExplorEnz has powerful and flexible search capabilities and provides the scientific community with the most up-to-date version of the IUBMB Enzyme List.
Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences

PubMed Central

Stephens, Susie M.; Chen, Jake Y.; Davidson, Marcel G.; Thomas, Shiby; Trute, Barry M.

2005-01-01

As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html PMID:15608287
Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra.

PubMed

Wang, Jianqi; Zhang, Yajie; Yu, Yonghao

2015-07-01

A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.
Southwell's Relaxation Search in Computer Aided Advising: An Intelligent Information System.

ERIC Educational Resources Information Center

Song, Xueshu

1992-01-01

Describes the development and validation of a microcomputer software system that enhances undergraduate students' interests in becoming engineering graduate students. The development of a database with information on engineering graduate programs is discussed, and a model that matches individual and institutional needs using Southwell's Relaxation…
Searching the scientific literature: implications for quantitative and qualitative reviews.

PubMed

Wu, Yelena P; Aylward, Brandon S; Roberts, Michael C; Evans, Spencer C

2012-08-01

Literature reviews are an essential step in the research process and are included in all empirical and review articles. Electronic databases are commonly used to gather this literature. However, several factors can affect the extent to which relevant articles are retrieved, influencing future research and conclusions drawn. The current project examined articles obtained by comparable search strategies in two electronic archives using an exemplar search to illustrate factors that authors should consider when designing their own search strategies. Specifically, literature searches were conducted in PsycINFO and PubMed targeting review articles on two exemplar disorders (bipolar disorder and attention deficit/hyperactivity disorder) and issues of classification and/or differential diagnosis. Articles were coded for relevance and characteristics of article content. The two search engines yielded significantly different proportions of relevant articles overall and by disorder. Keywords differed across search engines for the relevant articles identified. Based on these results, it is recommended that when gathering literature for review papers, multiple search engines should be used, and search syntax and strategies be tailored to the unique capabilities of particular engines. For meta-analyses and systematic reviews, authors may consider reporting the extent to which different archives or sources yielded relevant articles for their particular review. Copyright © 2012 Elsevier Ltd. All rights reserved.
Mining Hidden Gems Beneath the Surface: A Look At the Invisible Web.

ERIC Educational Resources Information Center

Carlson, Randal D.; Repman, Judi

2002-01-01

Describes resources for researchers called the Invisible Web that are hidden from the usual search engines and other tools and contrasts them with those resources available on the surface Web. Identifies specialized search tools, databases, and strategies that can be used to locate credible in-depth information. (Author/LRW)
An Investigation of Graduate Student Knowledge and Usage of Open-Access Journals

ERIC Educational Resources Information Center

Beard, Regina M.

2016-01-01

Graduate students lament the need to achieve the proficiency necessary to competently search multiple databases for their research assignments, regularly eschewing these sources in favor of Google Scholar or some other search engine. The author conducted an anonymous survey investigating graduate student knowledge or awareness of the open-access…
Systematically Identifying Relevant Research: Case Study on Child Protection Social Workers' Resilience

ERIC Educational Resources Information Center

McFadden, Paula; Taylor, Brian J.; Campbell, Anne; McQuilkin, Janice

2012-01-01

Context: The development of a consolidated knowledge base for social work requires rigorous approaches to identifying relevant research. Method: The quality of 10 databases and a web search engine were appraised by systematically searching for research articles on resilience and burnout in child protection social workers. Results: Applied Social…
Online World Conference and Expo: A Zillion Things at Once.

ERIC Educational Resources Information Center

Chuck, Lysbeth B.

1997-01-01

Presents the keynote speakers of the Online World 1997 conference, as well as HotBot and other search engines, the CyberClinic tracks (Practical Searching, Resource Management, Trends and Technology, Corporate Electronic Publishing, Content Reviews, and Roundtable Discussions), Web-based communities, and an exhibited database of over 12,000…
False discovery rates in spectral identification.

PubMed

Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno

2012-01-01

Automated database search engines are one of the fundamental engines of high-throughput proteomics enabling daily identifications of hundreds of thousands of peptides and proteins from tandem mass (MS/MS) spectrometry data. Nevertheless, this automation also makes it humanly impossible to manually validate the vast lists of resulting identifications from such high-throughput searches. This challenge is usually addressed by using a Target-Decoy Approach (TDA) to impose an empirical False Discovery Rate (FDR) at a pre-determined threshold x% with the expectation that at most x% of the returned identifications would be false positives. But despite the fundamental importance of FDR estimates in ensuring the utility of large lists of identifications, there is surprisingly little consensus on exactly how TDA should be applied to minimize the chances of biased FDR estimates. In fact, since less rigorous TDA/FDR estimates tend to result in more identifications (at higher 'true' FDR), there is often little incentive to enforce strict TDA/FDR procedures in studies where the major metric of success is the size of the list of identifications and there are no follow up studies imposing hard cost constraints on the number of reported false positives. Here we address the problem of the accuracy of TDA estimates of empirical FDR. Using MS/MS spectra from samples where we were able to define a factual FDR estimator of 'true' FDR we evaluate several popular variants of the TDA procedure in a variety of database search contexts. We show that the fraction of false identifications can sometimes be over 10× higher than reported and may be unavoidably high for certain types of searches. In addition, we further report that the two-pass search strategy seems the most promising database search strategy. While unavoidably constrained by the particulars of any specific evaluation dataset, our observations support a series of recommendations towards maximizing the number of resulting identifications while controlling database searches with robust and reproducible TDA estimation of empirical FDR.
High-performance hardware implementation of a parallel database search engine for real-time peptide mass fingerprinting

PubMed Central

Bogdán, István A.; Rivers, Jenny; Beynon, Robert J.; Coca, Daniel

2008-01-01

Motivation: Peptide mass fingerprinting (PMF) is a method for protein identification in which a protein is fragmented by a defined cleavage protocol (usually proteolysis with trypsin), and the masses of these products constitute a ‘fingerprint’ that can be searched against theoretical fingerprints of all known proteins. In the first stage of PMF, the raw mass spectrometric data are processed to generate a peptide mass list. In the second stage this protein fingerprint is used to search a database of known proteins for the best protein match. Although current software solutions can typically deliver a match in a relatively short time, a system that can find a match in real time could change the way in which PMF is deployed and presented. In a paper published earlier we presented a hardware design of a raw mass spectra processor that, when implemented in Field Programmable Gate Array (FPGA) hardware, achieves almost 170-fold speed gain relative to a conventional software implementation running on a dual processor server. In this article we present a complementary hardware realization of a parallel database search engine that, when running on a Xilinx Virtex 2 FPGA at 100 MHz, delivers 1800-fold speed-up compared with an equivalent C software routine, running on a 3.06 GHz Xeon workstation. The inherent scalability of the design means that processing speed can be multiplied by deploying the design on multiple FPGAs. The database search processor and the mass spectra processor, running on a reconfigurable computing platform, provide a complete real-time PMF protein identification solution. Contact: d.coca@sheffield.ac.uk PMID:18453553
ProtaBank: A repository for protein design and engineering data.

PubMed

Wang, Connie Y; Chang, Paul M; Ary, Marie L; Allen, Benjamin D; Chica, Roberto A; Mayo, Stephen L; Olafson, Barry D

2018-03-25

We present ProtaBank, a repository for storing, querying, analyzing, and sharing protein design and engineering data in an actively maintained and updated database. ProtaBank provides a format to describe and compare all types of protein mutational data, spanning a wide range of properties and techniques. It features a user-friendly web interface and programming layer that streamlines data deposition and allows for batch input and queries. The database schema design incorporates a standard format for reporting protein sequences and experimental data that facilitates comparison of results across different data sets. A suite of analysis and visualization tools are provided to facilitate discovery, to guide future designs, and to benchmark and train new predictive tools and algorithms. ProtaBank will provide a valuable resource to the protein engineering community by storing and safeguarding newly generated data, allowing for fast searching and identification of relevant data from the existing literature, and exploring correlations between disparate data sets. ProtaBank invites researchers to contribute data to the database to make it accessible for search and analysis. ProtaBank is available at https://protabank.org. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
ExplorEnz: a MySQL database of the IUBMB enzyme nomenclature

PubMed Central

McDonald, Andrew G; Boyce, Sinéad; Moss, Gerard P; Dixon, Henry BF; Tipton, Keith F

2007-01-01

Background We describe the database ExplorEnz, which is the primary repository for EC numbers and enzyme data that are being curated on behalf of the IUBMB. The enzyme nomenclature is incorporated into many other resources, including the ExPASy-ENZYME, BRENDA and KEGG bioinformatics databases. Description The data, which are stored in a MySQL database, preserve the formatting of chemical and enzyme names. A simple, easy to use, web-based query interface is provided, along with an advanced search engine for more complex queries. The database is publicly available at . The data are available for download as SQL and XML files via FTP. Conclusion ExplorEnz has powerful and flexible search capabilities and provides the scientific community with the most up-to-date version of the IUBMB Enzyme List. PMID:17662133
Difficulties and challenges associated with literature searches in operating room management, complete with recommendations.

PubMed

Wachtel, Ruth E; Dexter, Franklin

2013-12-01

The purpose of this article is to teach operating room managers, financial analysts, and those with a limited knowledge of search engines, including PubMed, how to locate articles they need in the areas of operating room and anesthesia group management. Many physicians are unaware of current literature in their field and evidence-based practices. The most common source of information is colleagues. Many people making management decisions do not read published scientific articles. Databases such as PubMed are available to search for such articles. Other databases, such as citation indices and Google Scholar, can be used to uncover additional articles. Nevertheless, most people who do not know how to use these databases are reluctant to utilize help resources when they do not know how to accomplish a task. Most people are especially reluctant to use on-line help files. Help files and search databases are often difficult to use because they have been designed for users already familiar with the field. The help files and databases have specialized vocabularies unique to the application. MeSH terms in PubMed are not useful alternatives for operating room management, an important limitation, because MeSH is the default when search terms are entered in PubMed. Librarians or those trained in informatics can be valuable assets for searching unusual databases, but they must possess the domain knowledge relative to the subject they are searching. The search methods we review are especially important when the subject area (e.g., anesthesia group management) is so specific that only 1 or 2 articles address the topic of interest. The materials are presented broadly enough that the reader can extrapolate the findings to other areas of clinical and management issues in anesthesiology.
Health search engine with e-document analysis for reliable search results.

PubMed

Gaudinat, Arnaud; Ruch, Patrick; Joubert, Michel; Uziel, Philippe; Strauss, Anne; Thonnet, Michèle; Baud, Robert; Spahni, Stéphane; Weber, Patrick; Bonal, Juan; Boyer, Celia; Fieschi, Marius; Geissbuhler, Antoine

2006-01-01

After a review of the existing practical solution available to the citizen to retrieve eHealth document, the paper describes an original specialized search engine WRAPIN. WRAPIN uses advanced cross lingual information retrieval technologies to check information quality by synthesizing medical concepts, conclusions and references contained in the health literature, to identify accurate, relevant sources. Thanks to MeSH terminology [1] (Medical Subject Headings from the U.S. National Library of Medicine) and advanced approaches such as conclusion extraction from structured document, reformulation of the query, WRAPIN offers to the user a privileged access to navigate through multilingual documents without language or medical prerequisites. The results of an evaluation conducted on the WRAPIN prototype show that results of the WRAPIN search engine are perceived as informative 65% (59% for a general-purpose search engine), reliable and trustworthy 72% (41% for the other engine) by users. But it leaves room for improvement such as the increase of database coverage, the explanation of the original functionalities and an audience adaptability. Thanks to evaluation outcomes, WRAPIN is now in exploitation on the HON web site (http://www.healthonnet.org), free of charge. Intended to the citizen it is a good alternative to general-purpose search engines when the user looks up trustworthy health and medical information or wants to check automatically a doubtful content of a Web page.
Ten Most Searched Databases by a Business Generalist--Part 1 or A Day in the Life of....

ERIC Educational Resources Information Center

Meredith, Meri

1986-01-01

Describes databases frequently used in Business Information Center, Cummins Engine Company (Columbus, Indiana): Dun and Bradstreet Business Information Report System, Newsearch, Dun and Bradstreet Market Identifiers, Trade and Industry Index, PTS PROMT, Bureau of Labor Statistics files, ABI/INFORM, Magazine Index, NEXIS, Dow Jones News/Retrieval.…
The Common Gateway Interface (CGI) for Enhancing Access to Database Servers via the World Wide Web (WWW).

ERIC Educational Resources Information Center

Machovec, George S., Ed.

1995-01-01

Explains the Common Gateway Interface (CGI) protocol as a set of rules for passing information from a Web server to an external program such as a database search engine. Topics include advantages over traditional client/server solutions, limitations, sample library applications, and sources of information from the Internet. (LRW)

Toward public volume database management: a case study of NOVA, the National Online Volumetric Archive

NASA Astrophysics Data System (ADS)

Fletcher, Alex; Yoo, Terry S.

2004-04-01

Public databases today can be constructed with a wide variety of authoring and management structures. The widespread appeal of Internet search engines suggests that public information be made open and available to common search strategies, making accessible information that would otherwise be hidden by the infrastructure and software interfaces of a traditional database management system. We present the construction and organizational details for managing NOVA, the National Online Volumetric Archive. As an archival effort of the Visible Human Project for supporting medical visualization research, archiving 3D multimodal radiological teaching files, and enhancing medical education with volumetric data, our overall database structure is simplified; archives grow by accruing information, but seldom have to modify, delete, or overwrite stored records. NOVA is being constructed and populated so that it is transparent to the Internet; that is, much of its internal structure is mirrored in HTML allowing internet search engines to investigate, catalog, and link directly to the deep relational structure of the collection index. The key organizational concept for NOVA is the Image Content Group (ICG), an indexing strategy for cataloging incoming data as a set structure rather than by keyword management. These groups are managed through a series of XML files and authoring scripts. We cover the motivation for Image Content Groups, their overall construction, authorship, and management in XML, and the pilot results for creating public data repositories using this strategy.
Cancer Internet search activity on a major search engine, United States 2001-2003.

PubMed

Cooper, Crystale Purvis; Mallon, Kenneth P; Leadbetter, Steven; Pollack, Lori A; Peipins, Lucy A

2005-07-01

To locate online health information, Internet users typically use a search engine, such as Yahoo! or Google. We studied Yahoo! search activity related to the 23 most common cancers in the United States. The objective was to test three potential correlates of Yahoo! cancer search activity--estimated cancer incidence, estimated cancer mortality, and the volume of cancer news coverage--and to study the periodicity of and peaks in Yahoo! cancer search activity. Yahoo! cancer search activity was obtained from a proprietary database called the Yahoo! Buzz Index. The American Cancer Society's estimates of cancer incidence and mortality were used. News reports associated with specific cancer types were identified using the LexisNexis "US News" database, which includes more than 400 national and regional newspapers and a variety of newswire services. The Yahoo! search activity associated with specific cancers correlated with their estimated incidence (Spearman rank correlation, rho = 0.50, P = .015), estimated mortality (rho = 0.66, P = .001), and volume of related news coverage (rho = 0.88, P < .001). Yahoo! cancer search activity tended to be higher on weekdays and during national cancer awareness months but lower during summer months; cancer news coverage also tended to follow these trends. Sharp increases in Yahoo! search activity scores from one day to the next appeared to be associated with increases in relevant news coverage. Media coverage appears to play a powerful role in prompting online searches for cancer information. Internet search activity offers an innovative tool for passive surveillance of health information-seeking behavior.
Cancer Internet Search Activity on a Major Search Engine, United States 2001-2003

PubMed Central

Cooper, Crystale Purvis; Mallon, Kenneth P; Leadbetter, Steven; Peipins, Lucy A

2005-01-01

Background To locate online health information, Internet users typically use a search engine, such as Yahoo! or Google. We studied Yahoo! search activity related to the 23 most common cancers in the United States. Objective The objective was to test three potential correlates of Yahoo! cancer search activity—estimated cancer incidence, estimated cancer mortality, and the volume of cancer news coverage—and to study the periodicity of and peaks in Yahoo! cancer search activity. Methods Yahoo! cancer search activity was obtained from a proprietary database called the Yahoo! Buzz Index. The American Cancer Society's estimates of cancer incidence and mortality were used. News reports associated with specific cancer types were identified using the LexisNexis “US News” database, which includes more than 400 national and regional newspapers and a variety of newswire services. Results The Yahoo! search activity associated with specific cancers correlated with their estimated incidence (Spearman rank correlation, ρ = 0.50, P = .015), estimated mortality (ρ = 0.66, P = .001), and volume of related news coverage (ρ = 0.88, P < .001). Yahoo! cancer search activity tended to be higher on weekdays and during national cancer awareness months but lower during summer months; cancer news coverage also tended to follow these trends. Sharp increases in Yahoo! search activity scores from one day to the next appeared to be associated with increases in relevant news coverage. Conclusions Media coverage appears to play a powerful role in prompting online searches for cancer information. Internet search activity offers an innovative tool for passive surveillance of health information–seeking behavior. PMID:15998627
ClinicalKey: a point-of-care search engine.

PubMed

Vardell, Emily

2013-01-01

ClinicalKey is a new point-of-care resource for health care professionals. Through controlled vocabulary, ClinicalKey offers a cross section of resources on diseases and procedures, from journals to e-books and practice guidelines to patient education. A sample search was conducted to demonstrate the features of the database, and a comparison with similar tools is presented.
The Ontological Perspectives of the Semantic Web and the Metadata Harvesting Protocol: Applications of Metadata for Improving Web Search.

ERIC Educational Resources Information Center

Fast, Karl V.; Campbell, D. Grant

2001-01-01

Compares the implied ontological frameworks of the Open Archives Initiative Protocol for Metadata Harvesting and the World Wide Web Consortium's Semantic Web. Discusses current search engine technology, semantic markup, indexing principles of special libraries and online databases, and componentization and the distinction between data and…
Protein-Level Integration Strategy of Multiengine MS Spectra Search Results for Higher Confidence and Sequence Coverage.

PubMed

Zhao, Panpan; Zhong, Jiayong; Liu, Wanting; Zhao, Jing; Zhang, Gong

2017-12-01

Multiple search engines based on various models have been developed to search MS/MS spectra against a reference database, providing different results for the same data set. How to integrate these results efficiently with minimal compromise on false discoveries is an open question due to the lack of an independent, reliable, and highly sensitive standard. We took the advantage of the translating mRNA sequencing (RNC-seq) result as a standard to evaluate the integration strategies of the protein identifications from various search engines. We used seven mainstream search engines (Andromeda, Mascot, OMSSA, X!Tandem, pFind, InsPecT, and ProVerB) to search the same label-free MS data sets of human cell lines Hep3B, MHCCLM3, and MHCC97H from the Chinese C-HPP Consortium for Chromosomes 1, 8, and 20. As expected, the union of seven engines resulted in a boosted false identification, whereas the intersection of seven engines remarkably decreased the identification power. We found that identifications of at least two out of seven engines resulted in maximizing the protein identification power while minimizing the ratio of suspicious/translation-supported identifications (STR), as monitored by our STR index, based on RNC-Seq. Furthermore, this strategy also significantly improves the peptides coverage of the protein amino acid sequence. In summary, we demonstrated a simple strategy to significantly improve the performance for shotgun mass spectrometry by protein-level integrating multiple search engines, maximizing the utilization of the current MS spectra without additional experimental work.
Internal combustion engine fuel controls. (Latest citations from the US Patent database). Published Search

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1992-12-01

The bibliography contains citations of selected patents concerning fuel control devices and methods for use in internal combustion engines. Patents describe air-fuel ratio control, fuel injection systems, evaporative fuel control, and surge-corrected fuel control. Citations also discuss electronic and feedback control, methods for engine protection, and fuel conservation. (Contains a minimum of 232 citations and includes a subject term index and title list.)
Engineering With Nature Geographic Project Mapping Tool (EWN ProMap)

DTIC Science & Technology

2015-07-01

EWN ProMap database provides numerous case studies for infrastructure projects such as breakwaters, river engineering dikes, and seawalls that have...the EWN Project Mapping Tool (EWN ProMap) is to assist users in their search for case study information that can be valuable for developing EWN ideas...Essential elements of EWN include: (1) using science and engineering to produce operational efficiencies supporting sustainable delivery of
RxnFinder: biochemical reaction search engines using molecular structures, molecular fragments and reaction similarity.

PubMed

Hu, Qian-Nan; Deng, Zhe; Hu, Huanan; Cao, Dong-Sheng; Liang, Yi-Zeng

2011-09-01

Biochemical reactions play a key role to help sustain life and allow cells to grow. RxnFinder was developed to search biochemical reactions from KEGG reaction database using three search criteria: molecular structures, molecular fragments and reaction similarity. RxnFinder is helpful to get reference reactions for biosynthesis and xenobiotics metabolism. RxnFinder is freely available via: http://sdd.whu.edu.cn/rxnfinder. qnhu@whu.edu.cn.
Refining comparative proteomics by spectral counting to account for shared peptides and multiple search engines

PubMed Central

Chen, Yao-Yi; Dasari, Surendra; Ma, Ze-Qiang; Vega-Montoto, Lorenzo J.; Li, Ming

2013-01-01

Spectral counting has become a widely used approach for measuring and comparing protein abundance in label-free shotgun proteomics. However, when analyzing complex samples, the ambiguity of matching between peptides and proteins greatly affects the assessment of peptide and protein inventories, differentiation, and quantification. Meanwhile, the configuration of database searching algorithms that assign peptides to MS/MS spectra may produce different results in comparative proteomic analysis. Here, we present three strategies to improve comparative proteomics through spectral counting. We show that comparing spectral counts for peptide groups rather than for protein groups forestalls problems introduced by shared peptides. We demonstrate the advantage and flexibility of this new method in two datasets. We present four models to combine four popular search engines that lead to significant gains in spectral counting differentiation. Among these models, we demonstrate a powerful vote counting model that scales well for multiple search engines. We also show that semi-tryptic searching outperforms tryptic searching for comparative proteomics. Overall, these techniques considerably improve protein differentiation on the basis of spectral count tables. PMID:22552787
Refining comparative proteomics by spectral counting to account for shared peptides and multiple search engines.

PubMed

Chen, Yao-Yi; Dasari, Surendra; Ma, Ze-Qiang; Vega-Montoto, Lorenzo J; Li, Ming; Tabb, David L

2012-09-01

Spectral counting has become a widely used approach for measuring and comparing protein abundance in label-free shotgun proteomics. However, when analyzing complex samples, the ambiguity of matching between peptides and proteins greatly affects the assessment of peptide and protein inventories, differentiation, and quantification. Meanwhile, the configuration of database searching algorithms that assign peptides to MS/MS spectra may produce different results in comparative proteomic analysis. Here, we present three strategies to improve comparative proteomics through spectral counting. We show that comparing spectral counts for peptide groups rather than for protein groups forestalls problems introduced by shared peptides. We demonstrate the advantage and flexibility of this new method in two datasets. We present four models to combine four popular search engines that lead to significant gains in spectral counting differentiation. Among these models, we demonstrate a powerful vote counting model that scales well for multiple search engines. We also show that semi-tryptic searching outperforms tryptic searching for comparative proteomics. Overall, these techniques considerably improve protein differentiation on the basis of spectral count tables.
The NIDDK Information Network: A Community Portal for Finding Data, Materials, and Tools for Researchers Studying Diabetes, Digestive, and Kidney Diseases

PubMed Central

Whetzel, Patricia L.; Grethe, Jeffrey S.; Banks, Davis E.; Martone, Maryann E.

2015-01-01

The NIDDK Information Network (dkNET; http://dknet.org) was launched to serve the needs of basic and clinical investigators in metabolic, digestive and kidney disease by facilitating access to research resources that advance the mission of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). By research resources, we mean the multitude of data, software tools, materials, services, projects and organizations available to researchers in the public domain. Most of these are accessed via web-accessible databases or web portals, each developed, designed and maintained by numerous different projects, organizations and individuals. While many of the large government funded databases, maintained by agencies such as European Bioinformatics Institute and the National Center for Biotechnology Information, are well known to researchers, many more that have been developed by and for the biomedical research community are unknown or underutilized. At least part of the problem is the nature of dynamic databases, which are considered part of the “hidden” web, that is, content that is not easily accessed by search engines. dkNET was created specifically to address the challenge of connecting researchers to research resources via these types of community databases and web portals. dkNET functions as a “search engine for data”, searching across millions of database records contained in hundreds of biomedical databases developed and maintained by independent projects around the world. A primary focus of dkNET are centers and projects specifically created to provide high quality data and resources to NIDDK researchers. Through the novel data ingest process used in dkNET, additional data sources can easily be incorporated, allowing it to scale with the growth of digital data and the needs of the dkNET community. Here, we provide an overview of the dkNET portal and its functions. We show how dkNET can be used to address a variety of use cases that involve searching for research resources. PMID:26393351
Beyond relevance and recall: testing new user-centred measures of database performance.

PubMed

Stokes, Peter; Foster, Allen; Urquhart, Christine

2009-09-01

Measures of the effectiveness of databases have traditionally focused on recall, precision, with some debate on how relevance can be assessed, and by whom. New measures of database performance are required when users are familiar with search engines, and expect full text availability. This research ascertained which of four bibliographic databases (BNI, CINAHL, MEDLINE and EMBASE) could be considered most useful to nursing and midwifery students searching for information for an undergraduate dissertation. Searches on title were performed for dissertation topics supplied by nursing students (n = 9), who made the relevance judgements. Measures of recall and precision were combined with additional factors to provide measures of effectiveness, while efficiency combined measures of novelty and originality and accessibility combined measures for availability and retrievability, based on obtainability. There were significant differences among the databases in precision, originality and availability, but other differences were not significant (Friedman test). Odds ratio tests indicated that BNI, followed by CINAHL were the most effective, CINAHL the most efficient, and BNI the most accessible. The methodology could help library services in purchase decisions as the measure for accessibility, and odds ratio testing helped to differentiate database performance.
Seeking health information on the web: positive hypothesis testing.

PubMed

Kayhan, Varol Onur

2013-04-01

The goal of this study is to investigate positive hypothesis testing among consumers of health information when they search the Web. After demonstrating the extent of positive hypothesis testing using Experiment 1, we conduct Experiment 2 to test the effectiveness of two debiasing techniques. A total of 60 undergraduate students searched a tightly controlled online database developed by the authors to test the validity of a hypothesis. The database had four abstracts that confirmed the hypothesis and three abstracts that disconfirmed it. Findings of Experiment 1 showed that majority of participants (85%) exhibited positive hypothesis testing. In Experiment 2, we found that the recommendation technique was not effective in reducing positive hypothesis testing since none of the participants assigned to this server could retrieve disconfirming evidence. Experiment 2 also showed that the incorporation technique successfully reduced positive hypothesis testing since 75% of the participants could retrieve disconfirming evidence. Positive hypothesis testing on the Web is an understudied topic. More studies are needed to validate the effectiveness of the debiasing techniques discussed in this study and develop new techniques. Search engine developers should consider developing new options for users so that both confirming and disconfirming evidence can be presented in search results as users test hypotheses using search engines. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
The Library's New Relevance: Fostering the First-Year Student's Acquisition, Evaluation, and Integration of Print and Electronic Materials

ERIC Educational Resources Information Center

Hlavaty, Greg; Townsend, Murphy

2010-01-01

Modern composition instructors often use and teach research methods for Internet search engines and electronic databases. It is not their intent to turn back the clock. However, if they can help students connect the world of Internet searches and the university library, they can promote information literacy in its broadest sense by developing…
Buckets: Smart Objects for Digital Libraries

NASA Technical Reports Server (NTRS)

Nelson, Michael L.

2001-01-01

Current discussion of digital libraries (DLs) is often dominated by the merits of the respective storage, search and retrieval functionality of archives, repositories, search engines, search interfaces and database systems. While these technologies are necessary for information management, the information content is more important than the systems used for its storage and retrieval. Digital information should have the same long-term survivability prospects as traditional hardcopy information and should be protected to the extent possible from evolving search engine technologies and vendor vagaries in database management systems. Information content and information retrieval systems should progress on independent paths and make limited assumptions about the status or capabilities of the other. Digital information can achieve independence from archives and DL systems through the use of buckets. Buckets are an aggregative, intelligent construct for publishing in DLs. Buckets allow the decoupling of information content from information storage and retrieval. Buckets exist within the Smart Objects and Dumb Archives model for DLs in that many of the functionalities and responsibilities traditionally associated with archives are pushed down (making the archives dumber) into the buckets (making them smarter). Some of the responsibilities imbued to buckets are the enforcement of their terms and conditions, and maintenance and display of their contents.
Research in Biomaterials and Tissue Engineering: Achievements and perspectives.

PubMed

Ventre, Maurizio; Causa, Filippo; Netti, Paolo A; Pietrabissa, Riccardo

2015-01-01

Research on biomaterials and related subjects has been active in Italy. Starting from the very first examples of biomaterials and biomedical devices, Italian researchers have always provided valuable scientific contributions. This trend has steadily increased. To provide a rough estimate of this, it is sufficient to search PubMed, a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics, with the keywords "biomaterials" or "tissue engineering" and sort the results by affiliation. Again, even though this is a crude estimate, the results speak for themselves, as Italy is the third European country, in terms of publications, with an astonishing 3,700 products in the last decade.
Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate.

PubMed

Park, Gun Wook; Hwang, Heeyoun; Kim, Kwang Hoe; Lee, Ju Yeon; Lee, Hyun Kyoung; Park, Ji Yeong; Ji, Eun Sun; Park, Sung-Kyu Robin; Yates, John R; Kwon, Kyung-Hoon; Park, Young Mok; Lee, Hyoung-Joo; Paik, Young-Ki; Kim, Jin Young; Yoo, Jong Shin

2016-11-04

In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).
PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more.

PubMed

Liu, Yifeng; Liang, Yongjie; Wishart, David

2015-07-01

PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more

PubMed Central

Liu, Yifeng; Liang, Yongjie; Wishart, David

2015-01-01

PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. PMID:25925572

Methods and apparatus for constructing and implementing a universal extension module for processing objects in a database

NASA Technical Reports Server (NTRS)

Li, Chung-Sheng (Inventor); Smith, John R. (Inventor); Chang, Yuan-Chi (Inventor); Jhingran, Anant D. (Inventor); Padmanabhan, Sriram K. (Inventor); Hsiao, Hui-I (Inventor); Choy, David Mun-Hien (Inventor); Lin, Jy-Jine James (Inventor); Fuh, Gene Y. C. (Inventor); Williams, Robin (Inventor)

2004-01-01

Methods and apparatus for providing a multi-tier object-relational database architecture are disclosed. In one illustrative embodiment of the present invention, a multi-tier database architecture comprises an object-relational database engine as a top tier, one or more domain-specific extension modules as a bottom tier, and one or more universal extension modules as a middle tier. The individual extension modules of the bottom tier operationally connect with the one or more universal extension modules which, themselves, operationally connect with the database engine. The domain-specific extension modules preferably provide such functions as search, index, and retrieval services of images, video, audio, time series, web pages, text, XML, spatial data, etc. The domain-specific extension modules may include one or more IBM DB2 extenders, Oracle data cartridges and/or Informix datablades, although other domain-specific extension modules may be used.
In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

PubMed

Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset

2017-01-06

In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations. Copyright © 2016. Published by Elsevier B.V.
MyMolDB: a micromolecular database solution with open source and free components.

PubMed

Xia, Bing; Tai, Zheng-Fu; Gu, Yu-Cheng; Li, Bang-Jing; Ding, Li-Sheng; Zhou, Yan

2011-10-01

To manage chemical structures in small laboratories is one of the important daily tasks. Few solutions are available on the internet, and most of them are closed source applications. The open-source applications typically have limited capability and basic cheminformatics functionalities. In this article, we describe an open-source solution to manage chemicals in research groups based on open source and free components. It has a user-friendly interface with the functions of chemical handling and intensive searching. MyMolDB is a micromolecular database solution that supports exact, substructure, similarity, and combined searching. This solution is mainly implemented using scripting language Python with a web-based interface for compound management and searching. Almost all the searches are in essence done with pure SQL on the database by using the high performance of the database engine. Thus, impressive searching speed has been archived in large data sets for no external Central Processing Unit (CPU) consuming languages were involved in the key procedure of the searching. MyMolDB is an open-source software and can be modified and/or redistributed under GNU General Public License version 3 published by the Free Software Foundation (Free Software Foundation Inc. The GNU General Public License, Version 3, 2007. Available at: http://www.gnu.org/licenses/gpl.html). The software itself can be found at http://code.google.com/p/mymoldb/. Copyright © 2011 Wiley Periodicals, Inc.
Flare-ups after endodontic treatment: a meta-analysis of literature.

PubMed

Tsesis, Igor; Faivishevsky, Vadim; Fuss, Zvi; Zukerman, Ofer

2008-10-01

The purpose of this study was to determine the frequency of flare-ups and to evaluate factors that affect it by using meta-analysis of results of previous studies. MEDLINE database was searched by using Entrez PubMed search engine and Medical Subject Headings (MeSH) search with EviDents Search Engine to identify the studies dealing with endodontic flare-up phenomenon. The search covered all articles published in dental journals in English from 1966-May 2007, and the relevancy of 119 selected articles was evaluated by reading their titles and abstracts, from which 54 were rejected as irrelevant and 65 were subjected to a suitability test. Six studies that met all the above mentioned criteria were included in the study. Average percentage of incidence of flare-ups for 982 patients was 8.4 (standard deviation +/-57). There were insufficient data to investigate the effect of the influencing factors.
Lynx: a database and knowledge extraction engine for integrative medicine.

PubMed

Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T Conrad; Maltsev, Natalia

2014-01-01

We have developed Lynx (http://lynx.ci.uchicago.edu)--a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces.
Problems of information support in scientific research

NASA Astrophysics Data System (ADS)

Shamaev, V. G.; Gorshkov, A. B.

2015-11-01

This paper reports on the creation of the open access Akustika portal (AKDATA.RU) designed to provide Russian-language easy-to-read and search information on acoustics and related topics. The absence of a Russian-language publication in foreign databases means that it is effectively lost for much of the scientific community. The portal has three interrelated sections: the Akustika information search system (ISS) (Acoustics), full-text archive of the Akusticheskii Zhurnal (Acoustic Journal), and 'Signal'naya informatsiya' ('Signaling information') on acoustics. The paper presents a description of the Akustika ISS, including its structure, content, interface, and information search capabilities for basic and applied research in diverse areas of science, engineering, biology, medicine, etc. The intended users of the portal are physicists, engineers, and engineering technologists interested in expanding their research activities and seeking to increase their knowledge base. Those studying current trends in the Russian-language contribution to international science may also find the portal useful.
Design and Implementation of a Threaded Search Engine for Tour Recommendation Systems

NASA Astrophysics Data System (ADS)

Lee, Junghoon; Park, Gyung-Leen; Ko, Jin-Hee; Shin, In-Hye; Kang, Mikyung

This paper implements a threaded scan engine for the O(n!) search space and measures its performance, aiming at providing a responsive tour recommendation and scheduling service. As a preliminary step of integrating POI ontology, mobile object database, and personalization profile for the development of new vehicular telematics services, this implementation can give a useful guideline to design a challenging and computation-intensive vehicular telematics service. The implemented engine allocates the subtree to the respective threads and makes them run concurrently exploiting the primitives provided by the operating system and the underlying multiprocessor architecture. It also makes it easy to add a variety of constraints, for example, the search tree is pruned if the cost of partial allocation already exceeds the current best. The performance measurement result shows that the service can run even in the low-power telematics device when the number of destinations does not exceed 15, with an appropriate constraint processing.
Publish or perish: Scientists must write or How do I climb the paper mountain?

USDA-ARS?s Scientific Manuscript database

This will be an interactive workshop for scientists discussing strategies for improving writing efficiency. Topics covered include database search engines, reference managing software, authorship, journal determination, writing tips and good writing habits....
Speeding up the screening of steroids in urine: development of a user-friendly library.

PubMed

Galesio, M; López-Fdez, H; Reboiro-Jato, M; Gómez-Meire, Silvana; Glez-Peña, D; Fdez-Riverola, F; Lodeiro, Carlos; Diniz, M E; Capelo, J L

2013-12-11

This work presents a novel database search engine - MLibrary - designed to assist the user in the detection and identification of androgenic anabolic steroids (AAS) and its metabolites by matrix assisted laser desorption/ionization (MALDI) and mass spectrometry-based strategies. The detection of the AAS in the samples was accomplished by searching (i) the mass spectrometric (MS) spectra against the library developed to identify possible positives and (ii) by comparison of the tandem mass spectrometric (MS/MS) spectra produced after fragmentation of the possible positives with a complete set of spectra that have previously been assigned to the software. The urinary screening for anabolic agents plays a major role in anti-doping laboratories as they represent the most abused drug class in sports. With the help of the MLibrary software application, the use of MALDI techniques for doping control is simplified and the time for evaluation and interpretation of the results is reduced. To do so, the search engine takes as input several MALDI-TOF-MS and MALDI-TOF-MS/MS spectra. It aids the researcher in an automatic mode by identifying possible positives in a single MS analysis and then confirming their presence in tandem MS analysis by comparing the experimental tandem mass spectrometric data with the database. Furthermore, the search engine can, potentially, be further expanded to other compounds in addition to AASs. The applicability of the MLibrary tool is shown through the analysis of spiked urine samples. Copyright © 2013 Elsevier Inc. All rights reserved.
Pattern Recognition-Assisted Infrared Library Searching of the Paint Data Query Database to Enhance Lead Information from Automotive Paint Trace Evidence.

PubMed

Lavine, Barry K; White, Collin G; Allen, Matthew D; Weakley, Andrew

2017-03-01

Multilayered automotive paint fragments, which are one of the most complex materials encountered in the forensic science laboratory, provide crucial links in criminal investigations and prosecutions. To determine the origin of these paint fragments, forensic automotive paint examiners have turned to the paint data query (PDQ) database, which allows the forensic examiner to compare the layer sequence and color, texture, and composition of the sample to paint systems of the original equipment manufacturer (OEM). However, modern automotive paints have a thin color coat and this layer on a microscopic fragment is often too thin to obtain accurate chemical and topcoat color information. A search engine has been developed for the infrared (IR) spectral libraries of the PDQ database in an effort to improve discrimination capability and permit quantification of discrimination power for OEM automotive paint comparisons. The similarity of IR spectra of the corresponding layers of various records for original finishes in the PDQ database often results in poor discrimination using commercial library search algorithms. A pattern recognition approach employing pre-filters and a cross-correlation library search algorithm that performs both a forward and backward search has been used to significantly improve the discrimination of IR spectra in the PDQ database and thus improve the accuracy of the search. This improvement permits inter-comparison of OEM automotive paint layer systems using the IR spectra alone. Such information can serve to quantify the discrimination power of the original automotive paint encountered in casework and further efforts to succinctly communicate trace evidence to the courts.
GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus

PubMed Central

Zhu, Yuelin; Davis, Sean; Stephens, Robert; Meltzer, Paul S.; Chen, Yidong

2008-01-01

The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data in GEO can be challenging. We have developed GEOmetadb in an attempt to make querying the GEO metadata both easier and more powerful. All GEO metadata records as well as the relationships between them are parsed and stored in a local MySQL database. A powerful, flexible web search interface with several convenient utilities provides query capabilities not available via NCBI tools. In addition, a Bioconductor package, GEOmetadb that utilizes a SQLite export of the entire GEOmetadb database is also available, rendering the entire GEO database accessible with full power of SQL-based queries from within R. Availability: The web interface and SQLite databases available at http://gbnci.abcc.ncifcrf.gov/geo/. The Bioconductor package is available via the Bioconductor project. The corresponding MATLAB implementation is also available at the same website. Contact: yidong@mail.nih.gov PMID:18842599
Review of Maxillary Expansion Appliance Activation Methods: Engineering and Clinical Perspectives

PubMed Central

Romanyk, D. L.; Lagravere, M. O.; Toogood, R. W.; Major, P. W.; Carey, J. P.

2010-01-01

Objective. Review the reported activation methods of maxillary expansion devices for midpalatal suture separation from an engineering perspective and suggest areas of improvement. Materials and Methods. A literature search of Scopus and PubMed was used to determine current expansion methods. A U.S. and Canadian patent database search was also conducted using patent classification and keywords. Any paper presenting a new method of expansion was included. Results. Expansion methods in use, or patented, can be classified as either a screw- or spring-type, magnetic, or shape memory alloy expansion appliance. Conclusions. Each activation method presented unique advantages and disadvantages from both clinical and engineering perspectives. Areas for improvement still remain and are identified in the paper. PMID:20948570
Sequence tagging reveals unexpected modifications in toxicoproteomics

PubMed Central

Dasari, Surendra; Chambers, Matthew C.; Codreanu, Simona G.; Liebler, Daniel C.; Collins, Ben C.; Pennington, Stephen R.; Gallagher, William M.; Tabb, David L.

2010-01-01

Toxicoproteomic samples are rich in posttranslational modifications (PTMs) of proteins. Identifying these modifications via standard database searching can incur significant performance penalties. Here we describe the latest developments in TagRecon, an algorithm that leverages inferred sequence tags to identify modified peptides in toxicoproteomic data sets. TagRecon identifies known modifications more effectively than the MyriMatch database search engine. TagRecon outperformed state of the art software in recognizing unanticipated modifications from LTQ, Orbitrap, and QTOF data sets. We developed user-friendly software for detecting persistent mass shifts from samples. We follow a three-step strategy for detecting unanticipated PTMs in samples. First, we identify the proteins present in the sample with a standard database search. Next, identified proteins are interrogated for unexpected PTMs with a sequence tag-based search. Finally, additional evidence is gathered for the detected mass shifts with a refinement search. Application of this technology on toxicoproteomic data sets revealed unintended cross-reactions between proteins and sample processing reagents. Twenty five proteins in rat liver showed signs of oxidative stress when exposed to potentially toxic drugs. These results demonstrate the value of mining toxicoproteomic data sets for modifications. PMID:21214251
Ariadne: a database search engine for identification and chemical analysis of RNA using tandem mass spectrometry data.

PubMed

Nakayama, Hiroshi; Akiyama, Misaki; Taoka, Masato; Yamauchi, Yoshio; Nobe, Yuko; Ishikawa, Hideaki; Takahashi, Nobuhiro; Isobe, Toshiaki

2009-04-01

We present here a method to correlate tandem mass spectra of sample RNA nucleolytic fragments with an RNA nucleotide sequence in a DNA/RNA sequence database, thereby allowing tandem mass spectrometry (MS/MS)-based identification of RNA in biological samples. Ariadne, a unique web-based database search engine, identifies RNA by two probability-based evaluation steps of MS/MS data. In the first step, the software evaluates the matches between the masses of product ions generated by MS/MS of an RNase digest of sample RNA and those calculated from a candidate nucleotide sequence in a DNA/RNA sequence database, which then predicts the nucleotide sequences of these RNase fragments. In the second step, the candidate sequences are mapped for all RNA entries in the database, and each entry is scored for a function of occurrences of the candidate sequences to identify a particular RNA. Ariadne can also predict post-transcriptional modifications of RNA, such as methylation of nucleotide bases and/or ribose, by estimating mass shifts from the theoretical mass values. The method was validated with MS/MS data of RNase T1 digests of in vitro transcripts. It was applied successfully to identify an unknown RNA component in a tRNA mixture and to analyze post-transcriptional modification in yeast tRNA(Phe-1).
Tips and tricks for using the internet for professional purposes.

PubMed

Ceylan, Hasan Huseyin; Güngören, Nurdan; Küçükdurmaz, Fatih

2017-05-01

Online resources provide access to large amounts of information which is expanding every day. Using search engines for reaching the relevant, updated and complete literature that is indexed in various bibliographical databases has already become part of the medical professionals' everyday life.However, most researchers often fail to conduct a efficient literature search on the internet. The right techniques in literature search save time and improve the quality of the retrieved data.Efficient literature search is not a talent but a learnable skill, which should be a formal part of medical education.This review briefly outlines the commonly used bibliographic databases, namely Pubmed, Cochrane Library, Web of Science, Scopus, EMBASE, CINAHL and Google Scholar. Also the definition of grey literature and its features are summarised. Cite this article: EFORT Open Rev 2017;2. DOI: 10.1302/2058-5241.2.160066. Originally published online at www.efortopenreviews.org.
Tips and tricks for using the internet for professional purposes

PubMed Central

Ceylan, Hasan Huseyin; Güngören, Nurdan; Küçükdurmaz, Fatih

2017-01-01

Online resources provide access to large amounts of information which is expanding every day. Using search engines for reaching the relevant, updated and complete literature that is indexed in various bibliographical databases has already become part of the medical professionals’ everyday life. However, most researchers often fail to conduct a efficient literature search on the internet. The right techniques in literature search save time and improve the quality of the retrieved data. Efficient literature search is not a talent but a learnable skill, which should be a formal part of medical education. This review briefly outlines the commonly used bibliographic databases, namely Pubmed, Cochrane Library, Web of Science, Scopus, EMBASE, CINAHL and Google Scholar. Also the definition of grey literature and its features are summarised. Cite this article: EFORT Open Rev 2017;2. DOI: 10.1302/2058-5241.2.160066. Originally published online at www.efortopenreviews.org PMID:28630750
MMDB: Entrez’s 3D-structure database

PubMed Central

Wang, Yanli; Anderson, John B.; Chen, Jie; Geer, Lewis Y.; He, Siqian; Hurwitz, David I.; Liebert, Cynthia A.; Madej, Thomas; Marchler, Gabriele H.; Marchler-Bauer, Aron; Panchenko, Anna R.; Shoemaker, Benjamin A.; Song, James S.; Thiessen, Paul A.; Yamashita, Roxanne A.; Bryant, Stephen H.

2002-01-01

Three-dimensional structures are now known within many protein families and it is quite likely, in searching a sequence database, that one will encounter a homolog with known structure. The goal of Entrez’s 3D-structure database is to make this information, and the functional annotation it can provide, easily accessible to molecular biologists. To this end Entrez’s search engine provides three powerful features. (i) Sequence and structure neighbors; one may select all sequences similar to one of interest, for example, and link to any known 3D structures. (ii) Links between databases; one may search by term matching in MEDLINE, for example, and link to 3D structures reported in these articles. (iii) Sequence and structure visualization; identifying a homolog with known structure, one may view molecular-graphic and alignment displays, to infer approximate 3D structure. In this article we focus on two features of Entrez’s Molecular Modeling Database (MMDB) not described previously: links from individual biopolymer chains within 3D structures to a systematic taxonomy of organisms represented in molecular databases, and links from individual chains (and compact 3D domains within them) to structure neighbors, other chains (and 3D domains) with similar 3D structure. MMDB may be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure. PMID:11752307
ASGARD: an open-access database of annotated transcriptomes for emerging model arthropod species.

PubMed

Zeng, Victor; Extavour, Cassandra G

2012-01-01

The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental biology, physiology, evolutionary biology, ecology, comparative genomics and phylogenomics. Database URL: asgard.rc.fas.harvard.edu.
BioSearch: a semantic search engine for Bio2RDF

PubMed Central

Qiu, Honglei; Huang, Jiacheng

2017-01-01

Abstract Biomedical data are growing at an incredible pace and require substantial expertise to organize data in a manner that makes them easily findable, accessible, interoperable and reusable. Massive effort has been devoted to using Semantic Web standards and technologies to create a network of Linked Data for the life sciences, among others. However, while these data are accessible through programmatic means, effective user interfaces for non-experts to SPARQL endpoints are few and far between. Contributing to user frustrations is that data are not necessarily described using common vocabularies, thereby making it difficult to aggregate results, especially when distributed across multiple SPARQL endpoints. We propose BioSearch — a semantic search engine that uses ontologies to enhance federated query construction and organize search results. BioSearch also features a simplified query interface that allows users to optionally filter their keywords according to classes, properties and datasets. User evaluation demonstrated that BioSearch is more effective and usable than two state of the art search and browsing solutions. Database URL: http://ws.nju.edu.cn/biosearch/ PMID:29220451
Protein Information Resource: a community resource for expert annotation of protein data

PubMed Central

Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

2001-01-01

The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041

GeneView: a comprehensive semantic search engine for PubMed.

PubMed

Thomas, Philippe; Starlinger, Johannes; Vowinkel, Alexander; Arzt, Sebastian; Leser, Ulf

2012-07-01

Research results are primarily published in scientific literature and curation efforts cannot keep up with the rapid growth of published literature. The plethora of knowledge remains hidden in large text repositories like MEDLINE. Consequently, life scientists have to spend a great amount of time searching for specific information. The enormous ambiguity among most names of biomedical objects such as genes, chemicals and diseases often produces too large and unspecific search results. We present GeneView, a semantic search engine for biomedical knowledge. GeneView is built upon a comprehensively annotated version of PubMed abstracts and openly available PubMed Central full texts. This semi-structured representation of biomedical texts enables a number of features extending classical search engines. For instance, users may search for entities using unique database identifiers or they may rank documents by the number of specific mentions they contain. Annotation is performed by a multitude of state-of-the-art text-mining tools for recognizing mentions from 10 entity classes and for identifying protein-protein interactions. GeneView currently contains annotations for >194 million entities from 10 classes for ∼21 million citations with 271,000 full text bodies. GeneView can be searched at http://bc3.informatik.hu-berlin.de/.
Lynx: a database and knowledge extraction engine for integrative medicine

PubMed Central

Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T. Conrad; Maltsev, Natalia

2014-01-01

We have developed Lynx (http://lynx.ci.uchicago.edu)—a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces. PMID:24270788
reSpect: Software for Identification of High and Low Abundance Ion Species in Chimeric Tandem Mass Spectra

PubMed Central

Shteynberg, David; Mendoza, Luis; Hoopmann, Michael R.; Sun, Zhi; Schmidt, Frank; Deutsch, Eric W.; Moritz, Robert L.

2016-01-01

Most shotgun proteomics data analysis workflows are based on the assumption that each fragment ion spectrum is explained by a single species of peptide ion isolated by the mass spectrometer; however, in reality mass spectrometers often isolate more than one peptide ion within the window of isolation that contributes to additional peptide fragment peaks in many spectra. We present a new tool called reSpect, implemented in the Trans-Proteomic Pipeline (TPP), that enables an iterative workflow whereby fragment ion peaks explained by a peptide ion identified in one round of sequence searching or spectral library search are attenuated based on the confidence of the identification, and then the altered spectrum is subjected to further rounds of searching. The reSpect tool is not implemented as a search engine, but rather as a post search engine processing step where only fragment ion intensities are altered. This enables the application of any search engine combination in the following iterations. Thus, reSpect is compatible with all other protein sequence database search engines as well as peptide spectral library search engines that are supported by the TPP. We show that while some datasets are highly amenable to chimeric spectrum identification and lead to additional peptide identification boosts of over 30% with as many as four different peptide ions identified per spectrum, datasets with narrow precursor ion selection only benefit from such processing at the level of a few percent. We demonstrate a technique that facilitates the determination of the degree to which a dataset would benefit from chimeric spectrum analysis. The reSpect tool is free and open source, provided within the TPP and available at the TPP website. PMID:26419769
reSpect: software for identification of high and low abundance ion species in chimeric tandem mass spectra.

PubMed

Shteynberg, David; Mendoza, Luis; Hoopmann, Michael R; Sun, Zhi; Schmidt, Frank; Deutsch, Eric W; Moritz, Robert L

2015-11-01

Most shotgun proteomics data analysis workflows are based on the assumption that each fragment ion spectrum is explained by a single species of peptide ion isolated by the mass spectrometer; however, in reality mass spectrometers often isolate more than one peptide ion within the window of isolation that contribute to additional peptide fragment peaks in many spectra. We present a new tool called reSpect, implemented in the Trans-Proteomic Pipeline (TPP), which enables an iterative workflow whereby fragment ion peaks explained by a peptide ion identified in one round of sequence searching or spectral library search are attenuated based on the confidence of the identification, and then the altered spectrum is subjected to further rounds of searching. The reSpect tool is not implemented as a search engine, but rather as a post-search engine processing step where only fragment ion intensities are altered. This enables the application of any search engine combination in the iterations that follow. Thus, reSpect is compatible with all other protein sequence database search engines as well as peptide spectral library search engines that are supported by the TPP. We show that while some datasets are highly amenable to chimeric spectrum identification and lead to additional peptide identification boosts of over 30% with as many as four different peptide ions identified per spectrum, datasets with narrow precursor ion selection only benefit from such processing at the level of a few percent. We demonstrate a technique that facilitates the determination of the degree to which a dataset would benefit from chimeric spectrum analysis. The reSpect tool is free and open source, provided within the TPP and available at the TPP website. Graphical Abstract ᅟ.
reSpect: Software for Identification of High and Low Abundance Ion Species in Chimeric Tandem Mass Spectra

NASA Astrophysics Data System (ADS)

Shteynberg, David; Mendoza, Luis; Hoopmann, Michael R.; Sun, Zhi; Schmidt, Frank; Deutsch, Eric W.; Moritz, Robert L.

2015-11-01

Most shotgun proteomics data analysis workflows are based on the assumption that each fragment ion spectrum is explained by a single species of peptide ion isolated by the mass spectrometer; however, in reality mass spectrometers often isolate more than one peptide ion within the window of isolation that contribute to additional peptide fragment peaks in many spectra. We present a new tool called reSpect, implemented in the Trans-Proteomic Pipeline (TPP), which enables an iterative workflow whereby fragment ion peaks explained by a peptide ion identified in one round of sequence searching or spectral library search are attenuated based on the confidence of the identification, and then the altered spectrum is subjected to further rounds of searching. The reSpect tool is not implemented as a search engine, but rather as a post-search engine processing step where only fragment ion intensities are altered. This enables the application of any search engine combination in the iterations that follow. Thus, reSpect is compatible with all other protein sequence database search engines as well as peptide spectral library search engines that are supported by the TPP. We show that while some datasets are highly amenable to chimeric spectrum identification and lead to additional peptide identification boosts of over 30% with as many as four different peptide ions identified per spectrum, datasets with narrow precursor ion selection only benefit from such processing at the level of a few percent. We demonstrate a technique that facilitates the determination of the degree to which a dataset would benefit from chimeric spectrum analysis. The reSpect tool is free and open source, provided within the TPP and available at the TPP website.
Aerodynamic Optimization of Rocket Control Surface Geometry Using Cartesian Methods and CAD Geometry

NASA Technical Reports Server (NTRS)

Nelson, Andrea; Aftosmis, Michael J.; Nemec, Marian; Pulliam, Thomas H.

2004-01-01

Aerodynamic design is an iterative process involving geometry manipulation and complex computational analysis subject to physical constraints and aerodynamic objectives. A design cycle consists of first establishing the performance of a baseline design, which is usually created with low-fidelity engineering tools, and then progressively optimizing the design to maximize its performance. Optimization techniques have evolved from relying exclusively on designer intuition and insight in traditional trial and error methods, to sophisticated local and global search methods. Recent attempts at automating the search through a large design space with formal optimization methods include both database driven and direct evaluation schemes. Databases are being used in conjunction with surrogate and neural network models as a basis on which to run optimization algorithms. Optimization algorithms are also being driven by the direct evaluation of objectives and constraints using high-fidelity simulations. Surrogate methods use data points obtained from simulations, and possibly gradients evaluated at the data points, to create mathematical approximations of a database. Neural network models work in a similar fashion, using a number of high-fidelity database calculations as training iterations to create a database model. Optimal designs are obtained by coupling an optimization algorithm to the database model. Evaluation of the current best design then gives either a new local optima and/or increases the fidelity of the approximation model for the next iteration. Surrogate methods have also been developed that iterate on the selection of data points to decrease the uncertainty of the approximation model prior to searching for an optimal design. The database approximation models for each of these cases, however, become computationally expensive with increase in dimensionality. Thus the method of using optimization algorithms to search a database model becomes problematic as the number of design variables is increased.
Evidential significance of automotive paint trace evidence using a pattern recognition based infrared library search engine for the Paint Data Query Forensic Database.

PubMed

Lavine, Barry K; White, Collin G; Allen, Matthew D; Fasasi, Ayuba; Weakley, Andrew

2016-10-01

A prototype library search engine has been further developed to search the infrared spectral libraries of the paint data query database to identify the line and model of a vehicle from the clear coat, surfacer-primer, and e-coat layers of an intact paint chip. For this study, search prefilters were developed from 1181 automotive paint systems spanning 3 manufacturers: General Motors, Chrysler, and Ford. The best match between each unknown and the spectra in the hit list generated by the search prefilters was identified using a cross-correlation library search algorithm that performed both a forward and backward search. In the forward search, spectra were divided into intervals and further subdivided into windows (which corresponds to the time lag for the comparison) within those intervals. The top five hits identified in each search window were compiled; a histogram was computed that summarized the frequency of occurrence for each library sample, with the IR spectra most similar to the unknown flagged. The backward search computed the frequency and occurrence of each line and model without regard to the identity of the individual spectra. Only those lines and models with a frequency of occurrence greater than or equal to 20% were included in the final hit list. If there was agreement between the forward and backward search results, the specific line and model common to both hit lists was always the correct assignment. Samples assigned to the same line and model by both searches are always well represented in the library and correlate well on an individual basis to specific library samples. For these samples, one can have confidence in the accuracy of the match. This was not the case for the results obtained using commercial library search algorithms, as the hit quality index scores for the top twenty hits were always greater than 99%. Copyright © 2016 Elsevier B.V. All rights reserved.
The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra.

PubMed

Shilov, Ignat V; Seymour, Sean L; Patel, Alpesh A; Loboda, Alex; Tang, Wilfred H; Keating, Sean P; Hunter, Christie L; Nuwaysir, Lydia M; Schaeffer, Daniel A

2007-09-01

The Paragon Algorithm, a novel database search engine for the identification of peptides from tandem mass spectrometry data, is presented. Sequence Temperature Values are computed using a sequence tag algorithm, allowing the degree of implication by an MS/MS spectrum of each region of a database to be determined on a continuum. Counter to conventional approaches, features such as modifications, substitutions, and cleavage events are modeled with probabilities rather than by discrete user-controlled settings to consider or not consider a feature. The use of feature probabilities in conjunction with Sequence Temperature Values allows for a very large increase in the effective search space with only a very small increase in the actual number of hypotheses that must be scored. The algorithm has a new kind of user interface that removes the user expertise requirement, presenting control settings in the language of the laboratory that are translated to optimal algorithmic settings. To validate this new algorithm, a comparison with Mascot is presented for a series of analogous searches to explore the relative impact of increasing search space probed with Mascot by relaxing the tryptic digestion conformance requirements from trypsin to semitrypsin to no enzyme and with the Paragon Algorithm using its Rapid mode and Thorough mode with and without tryptic specificity. Although they performed similarly for small search space, dramatic differences were observed in large search space. With the Paragon Algorithm, hundreds of biological and artifact modifications, all possible substitutions, and all levels of conformance to the expected digestion pattern can be searched in a single search step, yet the typical cost in search time is only 2-5 times that of conventional small search space. Despite this large increase in effective search space, there is no drastic loss of discrimination that typically accompanies the exploration of large search space.
Information extraction for enhanced access to disease outbreak reports.

PubMed

Grishman, Ralph; Huttunen, Silja; Yangarber, Roman

2002-08-01

Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.
A comparison of Boolean-based retrieval to the WAIS system for retrieval of aeronautical information

NASA Technical Reports Server (NTRS)

Marchionini, Gary; Barlow, Diane

1994-01-01

An evaluation of an information retrieval system using a Boolean-based retrieval engine and inverted file architecture and WAIS, which uses a vector-based engine, was conducted. Four research questions in aeronautical engineering were used to retrieve sets of citations from the NASA Aerospace Database which was mounted on a WAIS server and available through Dialog File 108 which served as the Boolean-based system (BBS). High recall and high precision searches were done in the BBS and terse and verbose queries were used in the WAIS condition. Precision values for the WAIS searches were consistently above the precision values for high recall BBS searches and consistently below the precision values for high precision BBS searches. Terse WAIS queries gave somewhat better precision performance than verbose WAIS queries. In every case, a small number of relevant documents retrieved by one system were not retrieved by the other, indicating the incomplete nature of the results from either retrieval system. Relevant documents in the WAIS searches were found to be randomly distributed in the retrieved sets rather than distributed by ranks. Advantages and limitations of both types of systems are discussed.
Integrated Controlling System and Unified Database for High Throughput Protein Crystallography Experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaponov, Yu.A.; Igarashi, N.; Hiraki, M.

2004-05-12

An integrated controlling system and a unified database for high throughput protein crystallography experiments have been developed. Main features of protein crystallography experiments (purification, crystallization, crystal harvesting, data collection, data processing) were integrated into the software under development. All information necessary to perform protein crystallography experiments is stored (except raw X-ray data that are stored in a central data server) in a MySQL relational database. The database contains four mutually linked hierarchical trees describing protein crystals, data collection of protein crystal and experimental data processing. A database editor was designed and developed. The editor supports basic database functions to view,more » create, modify and delete user records in the database. Two search engines were realized: direct search of necessary information in the database and object oriented search. The system is based on TCP/IP secure UNIX sockets with four predefined sending and receiving behaviors, which support communications between all connected servers and clients with remote control functions (creating and modifying data for experimental conditions, data acquisition, viewing experimental data, and performing data processing). Two secure login schemes were designed and developed: a direct method (using the developed Linux clients with secure connection) and an indirect method (using the secure SSL connection using secure X11 support from any operating system with X-terminal and SSH support). A part of the system has been implemented on a new MAD beam line, NW12, at the Photon Factory Advanced Ring for general user experiments.« less
Information-seeking behavior of basic science researchers: implications for library services.

PubMed

Haines, Laura L; Light, Jeanene; O'Malley, Donna; Delwiche, Frances A

2010-01-01

This study examined the information-seeking behaviors of basic science researchers to inform the development of customized library services. A qualitative study using semi-structured interviews was conducted on a sample of basic science researchers employed at a university medical school. The basic science researchers used a variety of information resources ranging from popular Internet search engines to highly technical databases. They generally relied on basic keyword searching, using the simplest interface of a database or search engine. They were highly collegial, interacting primarily with coworkers in their laboratories and colleagues employed at other institutions. They made little use of traditional library services and instead performed many traditional library functions internally. Although the basic science researchers expressed a positive attitude toward the library, they did not view its resources or services as integral to their work. To maximize their use by researchers, library resources must be accessible via departmental websites. Use of library services may be increased by cultivating relationships with key departmental administrative personnel. Despite their self-sufficiency, subjects expressed a desire for centralized information about ongoing research on campus and shared resources, suggesting a role for the library in creating and managing an institutional repository.
Information-seeking behavior of basic science researchers: implications for library services

PubMed Central

Haines, Laura L.; Light, Jeanene; O'Malley, Donna; Delwiche, Frances A.

2010-01-01

Objectives: This study examined the information-seeking behaviors of basic science researchers to inform the development of customized library services. Methods: A qualitative study using semi-structured interviews was conducted on a sample of basic science researchers employed at a university medical school. Results: The basic science researchers used a variety of information resources ranging from popular Internet search engines to highly technical databases. They generally relied on basic keyword searching, using the simplest interface of a database or search engine. They were highly collegial, interacting primarily with coworkers in their laboratories and colleagues employed at other institutions. They made little use of traditional library services and instead performed many traditional library functions internally. Conclusions: Although the basic science researchers expressed a positive attitude toward the library, they did not view its resources or services as integral to their work. To maximize their use by researchers, library resources must be accessible via departmental websites. Use of library services may be increased by cultivating relationships with key departmental administrative personnel. Despite their self-sufficiency, subjects expressed a desire for centralized information about ongoing research on campus and shared resources, suggesting a role for the library in creating and managing an institutional repository. PMID:20098658
Identifying duplicate content using statistically improbable phrases

PubMed Central

Errami, Mounir; Sun, Zhaohui; George, Angela C.; Long, Tara C.; Skinner, Michael A.; Wren, Jonathan D.; Garner, Harold R.

2010-01-01

Motivation: Document similarity metrics such as PubMed's ‘Find related articles’ feature, which have been primarily used to identify studies with similar topics, can now also be used to detect duplicated or potentially plagiarized papers within literature reference databases. However, the CPU-intensive nature of document comparison has limited MEDLINE text similarity studies to the comparison of abstracts, which constitute only a small fraction of a publication's total text. Extending searches to include text archived by online search engines would drastically increase comparison ability. For large-scale studies, submitting short phrases encased in direct quotes to search engines for exact matches would be optimal for both individual queries and programmatic interfaces. We have derived a method of analyzing statistically improbable phrases (SIPs) for assistance in identifying duplicate content. Results: When applied to MEDLINE citations, this method substantially improves upon previous algorithms in the detection of duplication citations, yielding a precision and recall of 78.9% (versus 50.3% for eTBLAST) and 99.6% (versus 99.8% for eTBLAST), respectively. Availability: Similar citations identified by this work are freely accessible in the Déjà vu database, under the SIP discovery method category at http://dejavu.vbi.vt.edu/dejavu/ Contact: merrami@collin.edu PMID:20472545
Development and Demonstration of a Networked Telepathology 3-D Imaging, Databasing, and Communication System

DTIC Science & Technology

1996-10-01

aligned using an octree search algorithm combined with cross correlation analysis . Successive 4x downsampling with optional and specifiable neighborhood...desired and the search engine embedded in the OODBMS will find the requested imagery and que it to the user for further analysis . This application was...obtained during Hoftmann-LaRoche production pathology imaging performed at UMICH. Versant works well and is easy to use; 3) Pathology Image Analysis
Comprehensive analysis of a multidimensional liquid chromatography mass spectrometry dataset acquired on a quadrupole selecting, quadrupole collision cell, time-of-flight mass spectrometer: I. How much of the data is theoretically interpretable by search engines?

PubMed

Chalkley, Robert J; Baker, Peter R; Hansen, Kirk C; Medzihradszky, Katalin F; Allen, Nadia P; Rexach, Michael; Burlingame, Alma L

2005-08-01

An in-depth analysis of a multidimensional chromatography-mass spectrometry dataset acquired on a quadrupole selecting, quadrupole collision cell, time-of-flight (QqTOF) geometry instrument was carried out. A total of 3269 CID spectra were acquired. Through manual verification of database search results and de novo interpretation of spectra 2368 spectra could be confidently determined as predicted tryptic peptides. A detailed analysis of the non-matching spectra was also carried out, highlighting what the non-matching spectra in a database search typically are composed of. The results of this comprehensive dataset study demonstrate that QqTOF instruments produce information-rich data of which a high percentage of the data is readily interpretable.
The study of data collection method for the plasma properties collection and evaluation system from web

NASA Astrophysics Data System (ADS)

Park, Jun-Hyoung; Song, Mi-Young; Plasma Fundamental Technology Research Team

2015-09-01

Plasma databases are necessarily required to compute the plasma parameters and high reliable databases are closely related with accuracy enhancement of simulations. Therefore, a major concern of plasma properties collection and evaluation system is to create a sustainable and useful research environment for plasma data. The system has a commitment to provide not only numerical data but also bibliographic data (including DOI information). Originally, our collection data methods were done by manual data search. In some cases, it took a long time to find data. We will be find data more automatically and quickly than legacy methods by crawling or search engine such as Lucene.
Structure-Based Characterization of Multiprotein Complexes

PubMed Central

Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J.

2014-01-01

Summary Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. PMID:24954616
MPA Portable: A Stand-Alone Software Package for Analyzing Metaproteome Samples on the Go.

PubMed

Muth, Thilo; Kohrs, Fabian; Heyer, Robert; Benndorf, Dirk; Rapp, Erdmann; Reichl, Udo; Martens, Lennart; Renard, Bernhard Y

2018-01-02

Metaproteomics, the mass spectrometry-based analysis of proteins from multispecies samples faces severe challenges concerning data analysis and results interpretation. To overcome these shortcomings, we here introduce the MetaProteomeAnalyzer (MPA) Portable software. In contrast to the original server-based MPA application, this newly developed tool no longer requires computational expertise for installation and is now independent of any relational database system. In addition, MPA Portable now supports state-of-the-art database search engines and a convenient command line interface for high-performance data processing tasks. While search engine results can easily be combined to increase the protein identification yield, an additional two-step workflow is implemented to provide sufficient analysis resolution for further postprocessing steps, such as protein grouping as well as taxonomic and functional annotation. Our new application has been developed with a focus on intuitive usability, adherence to data standards, and adaptation to Web-based workflow platforms. The open source software package can be found at https://github.com/compomics/meta-proteome-analyzer .
A novel algorithm for validating peptide identification from a shotgun proteomics search engine.

PubMed

Jian, Ling; Niu, Xinnan; Xia, Zhonghang; Samir, Parimal; Sumanasekera, Chiranthani; Mu, Zheng; Jennings, Jennifer L; Hoek, Kristen L; Allos, Tara; Howard, Leigh M; Edwards, Kathryn M; Weil, P Anthony; Link, Andrew J

2013-03-01

Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has revolutionized the proteomics analysis of complexes, cells, and tissues. In a typical proteomic analysis, the tandem mass spectra from a LC-MS/MS experiment are assigned to a peptide by a search engine that compares the experimental MS/MS peptide data to theoretical peptide sequences in a protein database. The peptide spectra matches are then used to infer a list of identified proteins in the original sample. However, the search engines often fail to distinguish between correct and incorrect peptides assignments. In this study, we designed and implemented a novel algorithm called De-Noise to reduce the number of incorrect peptide matches and maximize the number of correct peptides at a fixed false discovery rate using a minimal number of scoring outputs from the SEQUEST search engine. The novel algorithm uses a three-step process: data cleaning, data refining through a SVM-based decision function, and a final data refining step based on proteolytic peptide patterns. Using proteomics data generated on different types of mass spectrometers, we optimized the De-Noise algorithm on the basis of the resolution and mass accuracy of the mass spectrometer employed in the LC-MS/MS experiment. Our results demonstrate De-Noise improves peptide identification compared to other methods used to process the peptide sequence matches assigned by SEQUEST. Because De-Noise uses a limited number of scoring attributes, it can be easily implemented with other search engines.

Origin of Disagreements in Tandem Mass Spectra Interpretation by Search Engines.

PubMed

Tessier, Dominique; Lollier, Virginie; Larré, Colette; Rogniaux, Hélène

2016-10-07

Several proteomic database search engines that interpret LC-MS/MS data do not identify the same set of peptides. These disagreements occur even when the scores of the peptide-to-spectrum matches suggest good confidence in the interpretation. Our study shows that these disagreements observed for the interpretations of a given spectrum are almost exclusively due to the variation of what we call the "peptide space", i.e., the set of peptides that are actually compared to the experimental spectra. We discuss the potential difficulties of precisely defining the "peptide space." Indeed, although several parameters that are generally reported in publications can easily be set to the same values, many additional parameters-with much less straightforward user access-might impact the "peptide space" used by each program. Moreover, in a configuration where each search engine identifies the same candidates for each spectrum, the inference of the proteins may remain quite different depending on the false discovery rate selected.
Getting To Know the "Invisible Web."

ERIC Educational Resources Information Center

Smith, C. Brian

2001-01-01

Discusses the portions of the World Wide Web that cannot be accessed via directories or search engines, explains why they can't be accessed, and offers suggestions for reference librarians to find these sites. Lists helpful resources and gives examples of invisible Web sites which are often databases. (LRW)
Choosing Assessments that Matter

ERIC Educational Resources Information Center

Abilock, Debbie, Ed.

2007-01-01

Professionally, school librarians are faced with an explosion of choices--search engines, online catalogs, media types, subscription databases, and Web tools--all requiring scrutiny, evaluation, and selection. In turn, this support "stuff" forms a basis for making additional choices about how and what they teach and what they assess. Whereas once…
MyLibrary: A Web Personalized Digital Library.

ERIC Educational Resources Information Center

Rocha, Catarina; Xexeo, Geraldo; da Rocha, Ana Regina C.

With the increasing availability of information on Internet information providers, like search engines, digital libraries and online databases, it becomes more important to have personalized systems that help users to find relevant information. One type of personalization that is growing in use is recommender systems. This paper presents…
Six Wishes of a Public Service Librarian.

ERIC Educational Resources Information Center

Fescemyer, Kathy

2001-01-01

Suggests concepts related to information that would be valuable to library users, including the expenses related to information; unique qualities and characteristics of databases; limits of the Web; understanding differences between magazines and scholarly journals; search engine differences; and an appreciation for the amount and variety of…
A bibliography of literature pertaining to plague (Yersinia pestis)

USGS Publications Warehouse

Ellison, Laura E.; Frank, Megan K. Eberhardt

2011-01-01

Plague is an acute and often fatal zoonotic disease caused by the bacterium Yersinia pestis. Y. pestis mainly cycles between small mammals and their fleas; however, it has the potential to infect humans and frequently causes fatalities if left untreated. It is often considered a disease of the past; however, since the late 1800s, plagueis geographic range has expanded greatly, posing new threats in previously unaffected regions of the world, including the Western United States. A literature search was conducted using Internet resources and databases. The keywords chosen for the searches included plague, Yersinia pestis, management, control, wildlife, prairie dogs, fleas, North America, and mammals. Keywords were used alone or in combination with the other terms. Although this search pertains mostly to North America, citations were included from the international research community, as well. Databases and search engines used included Google (http://www.google.com), Google Scholar (http://scholar.google.com), SciVerse Scopus (http://www.scopus.com), ISI Web of Knowledge (http://apps.isiknowledge.com), and the USGS Library's Digital Desktop (http://library.usgs.gov). The literature-cited sections of manuscripts obtained from keyword searches were cross-referenced to identify additional citations or gray literature that was missed by the Internet search engines. This Open-File Report, published as an Internet-accessible bibliography, is intended to be periodically updated with new citations or older references that may have been missed during this compilation. Hence, the authors would be grateful to receive notice of any new or old papers that the audience (users) think need to be included.
Retrospective Evaluation of the Protocol for US Army Corps of Engineers Aquatic Ecosystem Restoration Projects. Part 2. Database Content and Data Entry Guidelines

DTIC Science & Technology

2014-01-01

entry and review procedures; (2) explain the various database components; (3) outline included datafields and datasets; and (4) document the...collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources...gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or
Software for Managing Inventory of Flight Hardware

NASA Technical Reports Server (NTRS)

Salisbury, John; Savage, Scott; Thomas, Shirman

2003-01-01

The Flight Hardware Support Request System (FHSRS) is a computer program that relieves engineers at Marshall Space Flight Center (MSFC) of most of the non-engineering administrative burden of managing an inventory of flight hardware. The FHSRS can also be adapted to perform similar functions for other organizations. The FHSRS affords a combination of capabilities, including those formerly provided by three separate programs in purchasing, inventorying, and inspecting hardware. The FHSRS provides a Web-based interface with a server computer that supports a relational database of inventory; electronic routing of requests and approvals; and electronic documentation from initial request through implementation of quality criteria, acquisition, receipt, inspection, storage, and final issue of flight materials and components. The database lists both hardware acquired for current projects and residual hardware from previous projects. The increased visibility of residual flight components provided by the FHSRS has dramatically improved the re-utilization of materials in lieu of new procurements, resulting in a cost savings of over $1.7 million. The FHSRS includes subprograms for manipulating the data in the database, informing of the status of a request or an item of hardware, and searching the database on any physical or other technical characteristic of a component or material. The software structure forces normalization of the data to facilitate inquiries and searches for which users have entered mixed or inconsistent values.
FDRAnalysis: a tool for the integrated analysis of tandem mass spectrometry identification results from multiple search engines.

PubMed

Wedge, David C; Krishna, Ritesh; Blackhurst, Paul; Siepen, Jennifer A; Jones, Andrew R; Hubbard, Simon J

2011-04-01

Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the postprocessing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server and a downloadable application that makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline also provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download.
FDRAnalysis: A tool for the integrated analysis of tandem mass spectrometry identification results from multiple search engines

PubMed Central

Wedge, David C; Krishna, Ritesh; Blackhurst, Paul; Siepen, Jennifer A; Jones, Andrew R.; Hubbard, Simon J.

2013-01-01

Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the post-processing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server, and a downloadable application, which makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download. PMID:21222473
BioModels Database: a repository of mathematical models of biological processes.

PubMed

Chelliah, Vijayalakshmi; Laibe, Camille; Le Novère, Nicolas

2013-01-01

BioModels Database is a public online resource that allows storing and sharing of published, peer-reviewed quantitative, dynamic models of biological processes. The model components and behaviour are thoroughly checked to correspond the original publication and manually curated to ensure reliability. Furthermore, the model elements are annotated with terms from controlled vocabularies as well as linked to relevant external data resources. This greatly helps in model interpretation and reuse. Models are stored in SBML format, accepted in SBML and CellML formats, and are available for download in various other common formats such as BioPAX, Octave, SciLab, VCML, XPP and PDF, in addition to SBML. The reaction network diagram of the models is also available in several formats. BioModels Database features a search engine, which provides simple and more advanced searches. Features such as online simulation and creation of smaller models (submodels) from the selected model elements of a larger one are provided. BioModels Database can be accessed both via a web interface and programmatically via web services. New models are available in BioModels Database at regular releases, about every 4 months.
"Where Is My Answer?": A Customer Service Status Report.

ERIC Educational Resources Information Center

Marcinko, Randy

1997-01-01

Describes the results of a study that tested the customer service responses from 11 companies selling online information including online hosts, database producers, and World Wide Web search engine companies. Highlights include content-oriented issues, costs, training, human interaction, and the use of technology to save time and increase…
An Hour with the Internet Curmudgeon.

ERIC Educational Resources Information Center

Morgovsky, Joel

While the Internet undeniably contains an enormous amount of information, community colleges should consider some key issues before joining the headlong rush toward virtual classrooms. First, information can be very difficult to find on the Internet. Although search engines, web databases, and subject directories have been developed to help users…
Hot Topics on the Web: Strategies for Research.

ERIC Educational Resources Information Center

Diaz, Karen R.; O'Hanlon, Nancy

2001-01-01

Presents strategies for researching topics on the Web that are controversial or current in nature. Discusses topic selection and overviews, including the use of online encyclopedias; search engines; finding laws and pending legislation; advocacy groups; proprietary databases; Web site evaluation; and the continuing usefulness of print materials.…
Accelerated Peer-Review Journal Usage Technique for Undergraduates

ERIC Educational Resources Information Center

Wallace, J. D.

2008-01-01

The internet has given undergraduate students ever-increasing access to academic journals via search engines and online databases. However, students typically do not have the ability to use these journals effectively. This often poses a dilemma for instructors. The accelerated peer-review journal usage (APJU) technique provides a way for…
The Internet as a Source of Academic Research Information: Findings of Two Pilot Studies.

ERIC Educational Resources Information Center

Kibirige, Harry M.; DePalo, Lisa

2000-01-01

Discussion of information available on the Internet focuses on two pilot studies that investigated how academic users perceive search engines and subject-oriented databases as sources of topical information. Highlights include information seeking behavior of academic users; undergraduate users; graduate users; faculty; and implications for…
Effect of digluconate chlorhexidine on bond strength between dental adhesive systems and dentin: A systematic review.

PubMed

Dionysopoulos, Dimitrios

2016-01-01

This study aimed to systematically review the literature for the effect of digluconate chlorhexidine (CHX) on bond strength between dental adhesive systems and dentin of composite restorations. The electronic databases that were searched to identify manuscripts for inclusion were Medline via PubMed and Google search engine. The search strategies were computer search of the database and review of reference lists of the related articles. Search words/terms were as follows: (digluconate chlorhexidine*) AND (dentin* OR adhesive system* OR bond strength*). Bond strength reduction after CHX treatments varied among the studies, ranging 0-84.9%. In most of the studies, pretreatment CHX exhibited lower bond strength reduction than the control experimental groups. Researchers who previously investigated the effect of CHX on the bond strength of dental adhesive systems on dentin have reported contrary results, which may be attributed to different experimental methods, different designs of the experiments, and different materials investigated. Further investigations, in particular clinical studies, would be necessary to clarify the effect of CHX on the longevity of dentin bonds.
Video Games for Diabetes Self-Management: Examples and Design Strategies

PubMed Central

Lieberman, Debra A.

2012-01-01

The July 2012 issue of the Journal of Diabetes Science and Technology includes a special symposium called “Serious Games for Diabetes, Obesity, and Healthy Lifestyle.” As part of the symposium, this article focuses on health behavior change video games that are designed to improve and support players’ diabetes self-management. Other symposium articles include one that recommends theory-based approaches to the design of health games and identifies areas in which additional research is needed, followed by five research articles presenting studies of the design and effectiveness of games and game technologies that require physical activity in order to play. This article briefly describes 14 diabetes self-management video games, and, when available, cites research findings on their effectiveness. The games were found by searching the Health Games Research online searchable database, three bibliographic databases (ACM Digital Library, PubMed, and Social Sciences Databases of CSA Illumina), and the Google search engine, using the search terms “diabetes” and “game.” Games were selected if they addressed diabetes self-management skills. PMID:22920805
The Online Bioinformatics Resources Collection at the University of Pittsburgh Health Sciences Library System--a one-stop gateway to online bioinformatics databases and software tools.

PubMed

Chen, Yi-Bu; Chattopadhyay, Ansuman; Bergen, Phillip; Gadd, Cynthia; Tannery, Nancy

2007-01-01

To bridge the gap between the rising information needs of biological and medical researchers and the rapidly growing number of online bioinformatics resources, we have created the Online Bioinformatics Resources Collection (OBRC) at the Health Sciences Library System (HSLS) at the University of Pittsburgh. The OBRC, containing 1542 major online bioinformatics databases and software tools, was constructed using the HSLS content management system built on the Zope Web application server. To enhance the output of search results, we further implemented the Vivísimo Clustering Engine, which automatically organizes the search results into categories created dynamically based on the textual information of the retrieved records. As the largest online collection of its kind and the only one with advanced search results clustering, OBRC is aimed at becoming a one-stop guided information gateway to the major bioinformatics databases and software tools on the Web. OBRC is available at the University of Pittsburgh's HSLS Web site (http://www.hsls.pitt.edu/guides/genetics/obrc).
Video games for diabetes self-management: examples and design strategies.

PubMed

Lieberman, Debra A

2012-07-01

The July 2012 issue of the Journal of Diabetes Science and Technology includes a special symposium called "Serious Games for Diabetes, Obesity, and Healthy Lifestyle." As part of the symposium, this article focuses on health behavior change video games that are designed to improve and support players' diabetes self-management. Other symposium articles include one that recommends theory-based approaches to the design of health games and identifies areas in which additional research is needed, followed by five research articles presenting studies of the design and effectiveness of games and game technologies that require physical activity in order to play. This article briefly describes 14 diabetes self-management video games, and, when available, cites research findings on their effectiveness. The games were found by searching the Health Games Research online searchable database, three bibliographic databases (ACM Digital Library, PubMed, and Social Sciences Databases of CSA Illumina), and the Google search engine, using the search terms "diabetes" and "game." Games were selected if they addressed diabetes self-management skills. © 2012 Diabetes Technology Society.

The role of Social Media and internet search engines in information provision and dissemination to patients with Kidney Stone Disease (KSD): A systematic review from EAU Young Academic Urologists (YAU).

PubMed

Jamnadass, Enakshee; Aboumarzouk, Omar; Kallidonis, Panagiotis; Emiliani, Esteban; Tailly, Thomas; Hruby, Stephan; Sanguedolce, Francesco; Atis, Gokhan; Özsoy, Mehmet; Greco, Francesco; Somani, Bhaskar K

2018-06-21

Kidney stone disease (KSD) affects millions of people worldwide and has an increasing incidence. Social media (SoMe) and search engines are both gaining in usage, whilst also being used by patients to research their conditions and aid in managing them. With this in mind, many authors have expressed the belief that SoMe and search engines can be used by patients and healthcare professionals to improve treatment compliance, and to help counselling and management of conditions such as KSD. We wanted to determine whether SoMe and search engines play a role in the management and/or prevention of KSD. The databases MEDLINE, Embase, CINAHL, Scopus and Cochrane Library were used to search for relevant English language literature from inception to December 2017. Results were screened by title, abstract, and then full text, according to the inclusion and exclusion criteria. The data was then analysed independently by the authors not involved in the original study. After initial identification of 2137 records and screening of 42 articles, 10 studies met the inclusion and exclusion criteria. The papers included focused on a variety of SoMe forms including two papers each on twitter, YouTube, smartphone apps and google search engine and one paper on google insights and google analytics. Regarding patient centered advice, while 2 papers covered advice on dietary, fluid intake and management options, two additional papers each covered advice on fluid advice and management options only, while no such advice was given by 3 of the SoMe published papers. SoMe and search engines provide valuable information to patients with kidney stone disease. However, whilst the information provided regarding dietary aspects and fluid management was good, it was not comprehensive enough to include advice on other aspects of KSD prevention.
Structure-based characterization of multiprotein complexes.

PubMed

Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J

2014-07-08

Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
search GenBank: interactive orchestration and ad-hoc choreography of Web services in the exploration of the biomedical resources of the National Center For Biotechnology Information

PubMed Central

2013-01-01

Background Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. Results We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user’s query, advanced data searching based on the specified user’s query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. Conclusions search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/. PMID:23452691
search GenBank: interactive orchestration and ad-hoc choreography of Web services in the exploration of the biomedical resources of the National Center For Biotechnology Information.

PubMed

Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Siążnik, Artur

2013-03-01

Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user's query, advanced data searching based on the specified user's query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/.
Ursgal, Universal Python Module Combining Common Bottom-Up Proteomics Tools for Large-Scale Analysis.

PubMed

Kremer, Lukas P M; Leufken, Johannes; Oyunchimeg, Purevdulam; Schulze, Stefan; Fufezan, Christian

2016-03-04

Proteomics data integration has become a broad field with a variety of programs offering innovative algorithms to analyze increasing amounts of data. Unfortunately, this software diversity leads to many problems as soon as the data is analyzed using more than one algorithm for the same task. Although it was shown that the combination of multiple peptide identification algorithms yields more robust results, it is only recently that unified approaches are emerging; however, workflows that, for example, aim to optimize search parameters or that employ cascaded style searches can only be made accessible if data analysis becomes not only unified but also and most importantly scriptable. Here we introduce Ursgal, a Python interface to many commonly used bottom-up proteomics tools and to additional auxiliary programs. Complex workflows can thus be composed using the Python scripting language using a few lines of code. Ursgal is easily extensible, and we have made several database search engines (X!Tandem, OMSSA, MS-GF+, Myrimatch, MS Amanda), statistical postprocessing algorithms (qvality, Percolator), and one algorithm that combines statistically postprocessed outputs from multiple search engines ("combined FDR") accessible as an interface in Python. Furthermore, we have implemented a new algorithm ("combined PEP") that combines multiple search engines employing elements of "combined FDR", PeptideShaker, and Bayes' theorem.
Metadata: Standards for Retrieving WWW Documents (and Other Digitized and Non-Digitized Resources)

NASA Astrophysics Data System (ADS)

Rusch-Feja, Diann

The use of metadata for indexing digitized and non-digitized resources for resource discovery in a networked environment is being increasingly implemented all over the world. Greater precision is achieved using metadata than relying on universal search engines and furthermore, meta-data can be used as filtering mechanisms for search results. An overview of various metadata sets is given, followed by a more focussed presentation of Dublin Core Metadata including examples of sub-elements and qualifiers. Especially the use of the Dublin Core Relation element provides connections between the metadata of various related electronic resources, as well as the metadata for physical, non-digitized resources. This facilitates more comprehensive search results without losing precision and brings together different genres of information which would otherwise be only searchable in separate databases. Furthermore, the advantages of Dublin Core Metadata in comparison with library cataloging and the use of universal search engines are discussed briefly, followed by a listing of types of implementation of Dublin Core Metadata.
Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature

PubMed Central

Müller, Hans-Michael; Kenny, Eimear E

2004-01-01

We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso's two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc.) and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., biological process, etc.). Together they form a catalog of types of objects and concepts called an ontology. After this ontology is populated with terms, the whole corpus of articles and abstracts is marked up to identify terms of these categories. The current ontology comprises 33 categories of terms. A search engine enables the user to search for one or a combination of these tags and/or keywords within a sentence or document, and as the ontology allows word meaning to be queried, it is possible to formulate semantic queries. Full text access increases recall of biological data types from 45% to 95%. Extraction of particular biological facts, such as gene-gene interactions, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences; in searches for two uniquely named genes and an interaction term, the ontology confers a 3-fold increase of search efficiency. Textpresso currently focuses on Caenorhabditis elegans literature, with 3,800 full text articles and 16,000 abstracts. The lexicon of the ontology contains 14,500 entries, each of which includes all versions of a specific word or phrase, and it includes all categories of the Gene Ontology database. Textpresso is a useful curation tool, as well as search engine for researchers, and can readily be extended to other organism-specific corpora of text. Textpresso can be accessed at http://www.textpresso.org or via WormBase at http://www.wormbase.org. PMID:15383839
Predicting Glycerophosphoinositol Identities in Lipidomic Datasets Using VaLID (Visualization and Phospholipid Identification)—An Online Bioinformatic Search Engine

PubMed Central

McDowell, Graeme S. V.; Taylor, Graeme P.; Fai, Stephen; Bennett, Steffany A. L.

2014-01-01

The capacity to predict and visualize all theoretically possible glycerophospholipid molecular identities present in lipidomic datasets is currently limited. To address this issue, we expanded the search-engine and compositional databases of the online Visualization and Phospholipid Identification (VaLID) bioinformatic tool to include the glycerophosphoinositol superfamily. VaLID v1.0.0 originally allowed exact and average mass libraries of 736,584 individual species from eight phospholipid classes: glycerophosphates, glyceropyrophosphates, glycerophosphocholines, glycerophosphoethanolamines, glycerophosphoglycerols, glycerophosphoglycerophosphates, glycerophosphoserines, and cytidine 5′-diphosphate 1,2-diacyl-sn-glycerols to be searched for any mass to charge value (with adjustable tolerance levels) under a variety of mass spectrometry conditions. Here, we describe an update that now includes all possible glycerophosphoinositols, glycerophosphoinositol monophosphates, glycerophosphoinositol bisphosphates, and glycerophosphoinositol trisphosphates. This update expands the total number of lipid species represented in the VaLID v2.0.0 database to 1,473,168 phospholipids. Each phospholipid can be generated in skeletal representation. A subset of species curated by the Canadian Institutes of Health Research Training Program in Neurodegenerative Lipidomics (CTPNL) team is provided as an array of high-resolution structures. VaLID is freely available and responds to all users through the CTPNL resources web site. PMID:24701584
Predicting glycerophosphoinositol identities in lipidomic datasets using VaLID (Visualization and Phospholipid Identification)--an online bioinformatic search engine.

PubMed

McDowell, Graeme S V; Blanchard, Alexandre P; Taylor, Graeme P; Figeys, Daniel; Fai, Stephen; Bennett, Steffany A L

2014-01-01

The capacity to predict and visualize all theoretically possible glycerophospholipid molecular identities present in lipidomic datasets is currently limited. To address this issue, we expanded the search-engine and compositional databases of the online Visualization and Phospholipid Identification (VaLID) bioinformatic tool to include the glycerophosphoinositol superfamily. VaLID v1.0.0 originally allowed exact and average mass libraries of 736,584 individual species from eight phospholipid classes: glycerophosphates, glyceropyrophosphates, glycerophosphocholines, glycerophosphoethanolamines, glycerophosphoglycerols, glycerophosphoglycerophosphates, glycerophosphoserines, and cytidine 5'-diphosphate 1,2-diacyl-sn-glycerols to be searched for any mass to charge value (with adjustable tolerance levels) under a variety of mass spectrometry conditions. Here, we describe an update that now includes all possible glycerophosphoinositols, glycerophosphoinositol monophosphates, glycerophosphoinositol bisphosphates, and glycerophosphoinositol trisphosphates. This update expands the total number of lipid species represented in the VaLID v2.0.0 database to 1,473,168 phospholipids. Each phospholipid can be generated in skeletal representation. A subset of species curated by the Canadian Institutes of Health Research Training Program in Neurodegenerative Lipidomics (CTPNL) team is provided as an array of high-resolution structures. VaLID is freely available and responds to all users through the CTPNL resources web site.
[Tissue engineering of urinary bladder using acellular matrix].

PubMed

Glybochko, P V; Olefir, Yu V; Alyaev, Yu G; Butnaru, D V; Bezrukov, E A; Chaplenko, A A; Zharikova, T M

2017-04-01

Tissue engineering has become a new promising strategy for repairing damaged organs of the urinary system, including the bladder. The basic idea of tissue engineering is to integrate cellular technology and advanced bio-compatible materials to replace or repair tissues and organs. of the study is the objective reflection of the current trends and advances in tissue engineering of the bladder using acellular matrix through a systematic search of preclinical and clinical studies of interest. Relevant studies, including those on methods of tissue engineering of urinary bladder, was retrieved from multiple databases, including Scopus, Web of Science, PubMed, Embase. The reference lists of the retrieved review articles were analyzed for the presence of the missing relevant publications. In addition, a manual search for registered clinical trials was conducted in clinicaltrials.gov. Following the above search strategy, a total of 77 eligible studies were selected for further analysis. Studies differed in the types of animal models, supporting structures, cells and growth factors. Among those, studies using cell-free matrix were selected for a more detailed analysis. Partial restoration of urothelium layer was observed in most studies where acellular grafts were used for cystoplasty, but no the growth of the muscle layer was observed. This is the main reason why cellular structures are more commonly used in clinical practice.
A Novel Concept for the Search and Retrieval of the Derwent Markush Resource Database.

PubMed

Barth, Andreas; Stengel, Thomas; Litterst, Edwin; Kraut, Hans; Matuszczyk, Henry; Ailer, Franz; Hajkowski, Steve

2016-05-23

The representation of and search for generic chemical structures (Markush) remains a continuing challenge. Several research groups have addressed this problem, and over time a limited number of practical solutions have been proposed. Today there are two large commercial providers of Markush databases: Chemical Abstracts Service (CAS) and Thomson Reuters. The Thomson Reuters "Derwent" Markush database is currently offered via the online services Questel and STN and as a data feed for in-house use. The aim of this paper is to briefly review the existing Markush systems (databases plus search engines) and to describe our new approach for the implementation of the Derwent Markush Resource on STN. Our new approach demonstrates the integration of the Derwent Markush Resource database into the existing chemistry-focused STN platform without loss of detail. This provides compatibility with other structure and Markush databases on STN and at the same time makes it possible to deploy the specific features and functions of the Derwent approach. It is shown that the different Markush languages developed by CAS and Derwent can be combined into a single general Markush description. In this concept the generic nodes are grouped together in a unique hierarchy where all chemical elements and fragments can be integrated. As a consequence, both systems are searchable using a single structure query. Moreover, the presented concept could serve as a promising starting point for a common generalized description of Markush structures.
Intelligent web image retrieval system

NASA Astrophysics Data System (ADS)

Hong, Sungyong; Lee, Chungwoo; Nah, Yunmook

2001-07-01

Recently, the web sites such as e-business sites and shopping mall sites deal with lots of image information. To find a specific image from these image sources, we usually use web search engines or image database engines which rely on keyword only retrievals or color based retrievals with limited search capabilities. This paper presents an intelligent web image retrieval system. We propose the system architecture, the texture and color based image classification and indexing techniques, and representation schemes of user usage patterns. The query can be given by providing keywords, by selecting one or more sample texture patterns, by assigning color values within positional color blocks, or by combining some or all of these factors. The system keeps track of user's preferences by generating user query logs and automatically add more search information to subsequent user queries. To show the usefulness of the proposed system, some experimental results showing recall and precision are also explained.
Analysing patent landscapes in plant biotechnology and new plant breeding techniques.

PubMed

Parisi, Claudia; Rodríguez-Cerezo, Emilio; Thangaraj, Harry

2013-02-01

This article aims to inform the reader of the importance of searching patent landscapes in plant biotechnology and the use of basic tools to perform a patent search. The recommendations for a patent search strategy are illustrated with the specific example of zinc finger nuclease technology for genetic engineering in plants. Within this scope, we provide a general introduction to searching using two online and free-access patent databases esp@cenet and PatentScope. The essential features of the two databases, and their functionality is described, together with short descriptions to enable the reader to understand patents, searching, their content, patent families, and their territorial scope. We mostly stress the value of patent searching for mining scientific, rather than legal information. Search methods through the use of keywords and patent codes are elucidated together with suggestions about how to search with or combine codes with keywords and we also comment on limitations of each method. We stress the importance of patent literature to complement more mainstream scientific literature, and the relative complexities and difficulties in searching patents compared to the latter. A parallel online resource where we describe detailed search exercises is available through reference for those intending further exploration. In essence this is aimed at a novice patent searcher who may want to examine accessory patent literature to complement knowledge gained from mainstream journal resources.
Referencing Science: Teaching Undergraduates to Identify, Validate, and Utilize Peer-Reviewed Online Literature

ERIC Educational Resources Information Center

Berzonsky, William A.; Richardson, Katherine D.

2008-01-01

Accessibility of online scientific literature continues to expand due to the advent of scholarly databases and search engines. Studies have shown that undergraduates favor using online scientific literature to address research questions, but they often do not have the skills to assess the validity of research articles. Undergraduates generally are…
Multi-lingual search engine to access PubMed monolingual subsets: a feasibility study.

PubMed

Darmoni, Stéfan J; Soualmia, Lina F; Griffon, Nicolas; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse

2013-01-01

PubMed contains many articles in languages other than English but it is difficult to find them using the English version of the Medical Subject Headings (MeSH) Thesaurus. The aim of this work is to propose a tool allowing access to a PubMed subset in one language, and to evaluate its performance. Translations of MeSH were enriched and gathered in the information system. PubMed subsets in main European languages were also added in our database, using a dedicated parser. The CISMeF generic semantic search engine was evaluated on the response time for simple queries. MeSH descriptors are currently available in 11 languages in the information system. All the 654,000 PubMed citations in French were integrated into CISMeF database. None of the response times exceed the threshold defined for usability (2 seconds). It is now possible to freely access biomedical literature in French using a tool in French; health professionals and lay people with a low English language may find it useful. It will be expended to several European languages: German, Spanish, Norwegian and Portuguese.
A survey of the neuroscience resource landscape: perspectives from the neuroscience information framework.

PubMed

Cachat, Jonathan; Bandrowski, Anita; Grethe, Jeffery S; Gupta, Amarnath; Astakhov, Vadim; Imam, Fahim; Larson, Stephen D; Martone, Maryann E

2012-01-01

The number of available neuroscience resources (databases, tools, materials, and networks) available via the Web continues to expand, particularly in light of newly implemented data sharing policies required by funding agencies and journals. However, the nature of dense, multifaceted neuroscience data and the design of classic search engine systems make efficient, reliable, and relevant discovery of such resources a significant challenge. This challenge is especially pertinent for online databases, whose dynamic content is largely opaque to contemporary search engines. The Neuroscience Information Framework was initiated to address this problem of finding and utilizing neuroscience-relevant resources. Since its first production release in 2008, NIF has been surveying the resource landscape for the neurosciences, identifying relevant resources and working to make them easily discoverable by the neuroscience community. In this chapter, we provide a survey of the resource landscape for neuroscience: what types of resources are available, how many there are, what they contain, and most importantly, ways in which these resources can be utilized by the research community to advance neuroscience research. Copyright © 2012 Elsevier Inc. All rights reserved.
LAILAPS: the plant science search engine.

PubMed

Esch, Maria; Chen, Jinbo; Colmsee, Christian; Klapperstück, Matthias; Grafahrend-Belau, Eva; Scholz, Uwe; Lange, Matthias

2015-01-01

With the number of sequenced plant genomes growing, the number of predicted genes and functional annotations is also increasing. The association between genes and phenotypic traits is currently of great interest. Unfortunately, the information available today is widely scattered over a number of different databases. Information retrieval (IR) has become an all-encompassing bioinformatics methodology for extracting knowledge from complex, heterogeneous and distributed databases, and therefore can be a useful tool for obtaining a comprehensive view of plant genomics, from genes to traits. Here we describe LAILAPS (http://lailaps.ipk-gatersleben.de), an IR system designed to link plant genomic data in the context of phenotypic attributes for a detailed forward genetic research. LAILAPS comprises around 65 million indexed documents, encompassing >13 major life science databases with around 80 million links to plant genomic resources. The LAILAPS search engine allows fuzzy querying for candidate genes linked to specific traits over a loosely integrated system of indexed and interlinked genome databases. Query assistance and an evidence-based annotation system enable time-efficient and comprehensive information retrieval. An artificial neural network incorporating user feedback and behavior tracking allows relevance sorting of results. We fully describe LAILAPS's functionality and capabilities by comparing this system's performance with other widely used systems and by reporting both a validation in maize and a knowledge discovery use-case focusing on candidate genes in barley. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
Databases and coordinated research projects at the IAEA on atomic processes in plasmas

NASA Astrophysics Data System (ADS)

Braams, Bastiaan J.; Chung, Hyun-Kyung

2012-05-01

The Atomic and Molecular Data Unit at the IAEA works with a network of national data centres to encourage and coordinate production and dissemination of fundamental data for atomic, molecular and plasma-material interaction (A+M/PMI) processes that are relevant to the realization of fusion energy. The Unit maintains numerical and bibliographical databases and has started a Wiki-style knowledge base. The Unit also contributes to A+M database interface standards and provides a search engine that offers a common interface to multiple numerical A+M/PMI databases. Coordinated Research Projects (CRPs) bring together fusion energy researchers and atomic, molecular and surface physicists for joint work towards the development of new data and new methods. The databases and current CRPs on A+M/PMI processes are briefly described here.
ATLAS of Biochemistry: A Repository of All Possible Biochemical Reactions for Synthetic Biology and Metabolic Engineering Studies.

PubMed

Hadadi, Noushin; Hafner, Jasmin; Shajkofci, Adrian; Zisaki, Aikaterini; Hatzimanikatis, Vassily

2016-10-21

Because the complexity of metabolism cannot be intuitively understood or analyzed, computational methods are indispensable for studying biochemistry and deepening our understanding of cellular metabolism to promote new discoveries. We used the computational framework BNICE.ch along with cheminformatic tools to assemble the whole theoretical reactome from the known metabolome through expansion of the known biochemistry presented in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We constructed the ATLAS of Biochemistry, a database of all theoretical biochemical reactions based on known biochemical principles and compounds. ATLAS includes more than 130 000 hypothetical enzymatic reactions that connect two or more KEGG metabolites through novel enzymatic reactions that have never been reported to occur in living organisms. Moreover, ATLAS reactions integrate 42% of KEGG metabolites that are not currently present in any KEGG reaction into one or more novel enzymatic reactions. The generated repository of information is organized in a Web-based database ( http://lcsb-databases.epfl.ch/atlas/ ) that allows the user to search for all possible routes from any substrate compound to any product. The resulting pathways involve known and novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential targets for protein engineering. Our approach of introducing novel biochemistry into pathway design and associated databases will be important for synthetic biology and metabolic engineering.
Advanced SPARQL querying in small molecule databases.

PubMed

Galgonek, Jakub; Hurt, Tomáš; Michlíková, Vendula; Onderka, Petr; Schwarz, Jan; Vondrášek, Jiří

2016-01-01

In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF.

Methods for conducting systematic reviews of risk factors in low- and middle-income countries.

PubMed

Shenderovich, Yulia; Eisner, Manuel; Mikton, Christopher; Gardner, Frances; Liu, Jianghong; Murray, Joseph

2016-03-15

Rates of youth violence are disproportionately high in many low- and middle-income countries [LMICs] but existing reviews of risk factors focus almost exclusively on high-income countries. Different search strategies, including non-English language searches, might be required to identify relevant evidence in LMICs. This paper discusses methodological issues in systematic reviews aiming to include evidence from LMICs, using the example of a recent review of risk factors for child conduct problems and youth violence in LMICs. We searched the main international databases, such as PsycINFO, Medline and EMBASE in English, as well as 12 regional databases in Arabic, Chinese, English, French, Spanish, Portuguese and Russian. In addition, we used internet search engines and Google Scholar, and contacted over 200 researchers and organizations to identify potentially eligible studies in LMICs. The majority of relevant studies were identified in the mainstream databases, but additional studies were also found through regional databases, such as CNKI, Wangfang, LILACS and SciELO. Overall, 85% of eligible studies were in English, and 15% were reported in Chinese, Spanish, Portuguese, Russian or French. Among eligible studies in languages other than English, two-thirds were identified only by regional databases and one-third was also indexed in the main international databases. There are many studies on child conduct problems and youth violence in LMICs which have not been included in prior reviews. Most research on these subjects in LMICs has been produced in the last two-three decades and mostly in middle-income countries, such as China, Brazil, Turkey, South Africa and Russia. Based on our findings, it appears that many studies of child conduct problems and youth violence in LMICs are reported in English, Chinese, Spanish and Portuguese, but few such studies are published in French, Arabic or Russian. If non-English language searches and screening had not been conducted in the current review, 15% of eligible studies would have been missed. Although there are benefits to non-English language searches and the inclusion of non-English studies in meta-analyses, systematic reviewers also need to consider the resources required to incorporate multi-lingual research.
LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC.

PubMed

Allot, Alexis; Peng, Yifan; Wei, Chih-Hsuan; Lee, Kyubum; Phan, Lon; Lu, Zhiyong

2018-05-14

The identification and interpretation of genomic variants play a key role in the diagnosis of genetic diseases and related research. These tasks increasingly rely on accessing relevant manually curated information from domain databases (e.g. SwissProt or ClinVar). However, due to the sheer volume of medical literature and high cost of expert curation, curated variant information in existing databases are often incomplete and out-of-date. In addition, the same genetic variant can be mentioned in publications with various names (e.g. 'A146T' versus 'c.436G>A' versus 'rs121913527'). A search in PubMed using only one name usually cannot retrieve all relevant articles for the variant of interest. Hence, to help scientists, healthcare professionals, and database curators find the most up-to-date published variant research, we have developed LitVar for the search and retrieval of standardized variant information. In addition, LitVar uses advanced text mining techniques to compute and extract relationships between variants and other associated entities such as diseases and chemicals/drugs. LitVar is publicly available at https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar.
Mind Maps: Hot New Tools Proposed for Cyberspace Librarians.

ERIC Educational Resources Information Center

Humphreys, Nancy K.

1999-01-01

Describes how online searchers can use a software tool based on back-of-the-book indexes to assist in dealing with search engine databases compiled by spiders that crawl across the entire Internet or through large Web sites. Discusses human versus machine knowledge, conversion of indexes to mind maps or mini-thesauri, middleware, eXtensible Markup…
An overview of biomedical literature search on the World Wide Web in the third millennium.

PubMed

Kumar, Prince; Goel, Roshni; Jain, Chandni; Kumar, Ashish; Parashar, Abhishek; Gond, Ajay Ratan

2012-06-01

Complete access to the existing pool of biomedical literature and the ability to "hit" upon the exact information of the relevant specialty are becoming essential elements of academic and clinical expertise. With the rapid expansion of the literature database, it is almost impossible to keep up to date with every innovation. Using the Internet, however, most people can freely access this literature at any time, from almost anywhere. This paper highlights the use of the Internet in obtaining valuable biomedical research information, which is mostly available from journals, databases, textbooks and e-journals in the form of web pages, text materials, images, and so on. The authors present an overview of web-based resources for biomedical researchers, providing information about Internet search engines (e.g., Google), web-based bibliographic databases (e.g., PubMed, IndMed) and how to use them, and other online biomedical resources that can assist clinicians in reaching well-informed clinical decisions.
A Firefly Algorithm-based Approach for Pseudo-Relevance Feedback: Application to Medical Database.

PubMed

Khennak, Ilyes; Drias, Habiba

2016-11-01

The difficulty of disambiguating the sense of the incomplete and imprecise keywords that are extensively used in the search queries has caused the failure of search systems to retrieve the desired information. One of the most powerful and promising method to overcome this shortcoming and improve the performance of search engines is Query Expansion, whereby the user's original query is augmented by new keywords that best characterize the user's information needs and produce more useful query. In this paper, a new Firefly Algorithm-based approach is proposed to enhance the retrieval effectiveness of query expansion while maintaining low computational complexity. In contrast to the existing literature, the proposed approach uses a Firefly Algorithm to find the best expanded query among a set of expanded query candidates. Moreover, this new approach allows the determination of the length of the expanded query empirically. Experimental results on MEDLINE, the on-line medical information database, show that our proposed approach is more effective and efficient compared to the state-of-the-art.
Seismic Search Engine: A distributed database for mining large scale seismic data

NASA Astrophysics Data System (ADS)

Liu, Y.; Vaidya, S.; Kuzma, H. A.

2009-12-01

The International Monitoring System (IMS) of the CTBTO collects terabytes worth of seismic measurements from many receiver stations situated around the earth with the goal of detecting underground nuclear testing events and distinguishing them from other benign, but more common events such as earthquakes and mine blasts. The International Data Center (IDC) processes and analyzes these measurements, as they are collected by the IMS, to summarize event detections in daily bulletins. Thereafter, the data measurements are archived into a large format database. Our proposed Seismic Search Engine (SSE) will facilitate a framework for data exploration of the seismic database as well as the development of seismic data mining algorithms. Analogous to GenBank, the annotated genetic sequence database maintained by NIH, through SSE, we intend to provide public access to seismic data and a set of processing and analysis tools, along with community-generated annotations and statistical models to help interpret the data. SSE will implement queries as user-defined functions composed from standard tools and models. Each query is compiled and executed over the database internally before reporting results back to the user. Since queries are expressed with standard tools and models, users can easily reproduce published results within this framework for peer-review and making metric comparisons. As an illustration, an example query is “what are the best receiver stations in East Asia for detecting events in the Middle East?” Evaluating this query involves listing all receiver stations in East Asia, characterizing known seismic events in that region, and constructing a profile for each receiver station to determine how effective its measurements are at predicting each event. The results of this query can be used to help prioritize how data is collected, identify defective instruments, and guide future sensor placements.
G-Bean: an ontology-graph based web tool for biomedical literature retrieval

PubMed Central

2014-01-01

Background Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. Methods G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Results Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. Conclusions G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user. PMID:25474588
G-Bean: an ontology-graph based web tool for biomedical literature retrieval.

PubMed

Wang, James Z; Zhang, Yuanyuan; Dong, Liang; Li, Lin; Srimani, Pradip K; Yu, Philip S

2014-01-01

Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.
Toward two-dimensional search engines

NASA Astrophysics Data System (ADS)

Ermann, L.; Chepelianskii, A. D.; Shepelyansky, D. L.

2012-07-01

We study the statistical properties of various directed networks using ranking of their nodes based on the dominant vectors of the Google matrix known as PageRank and CheiRank. On average PageRank orders nodes proportionally to a number of ingoing links, while CheiRank orders nodes proportionally to a number of outgoing links. In this way, the ranking of nodes becomes two dimensional which paves the way for the development of two-dimensional search engines of a new type. Statistical properties of information flow on the PageRank-CheiRank plane are analyzed for networks of British, French and Italian universities, Wikipedia, Linux Kernel, gene regulation and other networks. A special emphasis is done for British universities networks using the large database publicly available in the UK. Methods of spam links control are also analyzed.
SeqWare Query Engine: storing and searching sequence data in the cloud.

PubMed

O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

2010-12-21

Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
SeqWare Query Engine: storing and searching sequence data in the cloud

PubMed Central

2010-01-01

Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981
PosMed-plus: an intelligent search engine that inferentially integrates cross-species information resources for molecular breeding of plants.

PubMed

Makita, Yuko; Kobayashi, Norio; Mochizuki, Yoshiki; Yoshida, Yuko; Asano, Satomi; Heida, Naohiko; Deshpande, Mrinalini; Bhatia, Rinki; Matsushima, Akihiro; Ishii, Manabu; Kawaguchi, Shuji; Iida, Kei; Hanada, Kosuke; Kuromori, Takashi; Seki, Motoaki; Shinozaki, Kazuo; Toyoda, Tetsuro

2009-07-01

Molecular breeding of crops is an efficient way to upgrade plant functions useful to mankind. A key step is forward genetics or positional cloning to identify the genes that confer useful functions. In order to accelerate the whole research process, we have developed an integrated database system powered by an intelligent data-retrieval engine termed PosMed-plus (Positional Medline for plant upgrading science), allowing us to prioritize highly promising candidate genes in a given chromosomal interval(s) of Arabidopsis thaliana and rice, Oryza sativa. By inferentially integrating cross-species information resources including genomes, transcriptomes, proteomes, localizomes, phenomes and literature, the system compares a user's query, such as phenotypic or functional keywords, with the literature associated with the relevant genes located within the interval. By utilizing orthologous and paralogous correspondences, PosMed-plus efficiently integrates cross-species information to facilitate the ranking of rice candidate genes based on evidence from other model species such as Arabidopsis. PosMed-plus is a plant science version of the PosMed system widely used by mammalian researchers, and provides both a powerful integrative search function and a rich integrative display of the integrated databases. PosMed-plus is the first cross-species integrated database that inferentially prioritizes candidate genes for forward genetics approaches in plant science, and will be expanded for wider use in plant upgrading in many species.
Optics Toolbox: An Intelligent Relational Database System For Optical Designers

NASA Astrophysics Data System (ADS)

Weller, Scott W.; Hopkins, Robert E.

1986-12-01

Optical designers were among the first to use the computer as an engineering tool. Powerful programs have been written to do ray-trace analysis, third-order layout, and optimization. However, newer computing techniques such as database management and expert systems have not been adopted by the optical design community. For the purpose of this discussion we will define a relational database system as a database which allows the user to specify his requirements using logical relations. For example, to search for all lenses in a lens database with a F/number less than two, and a half field of view near 28 degrees, you might enter the following: FNO < 2.0 and FOV of 28 degrees ± 5% Again for the purpose of this discussion, we will define an expert system as a program which contains expert knowledge, can ask intelligent questions, and can form conclusions based on the answers given and the knowledge which it contains. Most expert systems store this knowledge in the form of rules-of-thumb, which are written in an English-like language, and which are easily modified by the user. An example rule is: IF require microscope objective in air and require NA > 0.9 THEN suggest the use of an oil immersion objective The heart of the expert system is the rule interpreter, sometimes called an inference engine, which reads the rules and forms conclusions based on them. The use of a relational database system containing lens prototypes seems to be a viable prospect. However, it is not clear that expert systems have a place in optical design. In domains such as medical diagnosis and petrology, expert systems are flourishing. These domains are quite different from optical design, however, because optical design is a creative process, and the rules are difficult to write down. We do think that an expert system is feasible in the area of first order layout, which is sufficiently diagnostic in nature to permit useful rules to be written. This first-order expert would emulate an expert designer as he interacted with a customer for the first time: asking the right questions, forming conclusions, and making suggestions. With these objectives in mind, we have developed the Optics Toolbox. Optics Toolbox is actually two programs in one: it is a powerful relational database system with twenty-one search parameters, four search modes, and multi-database support, as well as a first-order optical design expert system with a rule interpreter which has full access to the relational database. The system schematic is shown in Figure 1.
A Knowledge Database on Thermal Control in Manufacturing Processes

NASA Astrophysics Data System (ADS)

Hirasawa, Shigeki; Satoh, Isao

A prototype version of a knowledge database on thermal control in manufacturing processes, specifically, molding, semiconductor manufacturing, and micro-scale manufacturing has been developed. The knowledge database has search functions for technical data, evaluated benchmark data, academic papers, and patents. The database also displays trends and future roadmaps for research topics. It has quick-calculation functions for basic design. This paper summarizes present research topics and future research on thermal control in manufacturing engineering to collate the information to the knowledge database. In the molding process, the initial mold and melt temperatures are very important parameters. In addition, thermal control is related to many semiconductor processes, and the main parameter is temperature variation in wafers. Accurate in-situ temperature measurment of wafers is important. And many technologies are being developed to manufacture micro-structures. Accordingly, the knowledge database will help further advance these technologies.
Databases and coordinated research projects at the IAEA on atomic processes in plasmas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Braams, Bastiaan J.; Chung, Hyun-Kyung

2012-05-25

The Atomic and Molecular Data Unit at the IAEA works with a network of national data centres to encourage and coordinate production and dissemination of fundamental data for atomic, molecular and plasma-material interaction (A+M/PMI) processes that are relevant to the realization of fusion energy. The Unit maintains numerical and bibliographical databases and has started a Wiki-style knowledge base. The Unit also contributes to A+M database interface standards and provides a search engine that offers a common interface to multiple numerical A+M/PMI databases. Coordinated Research Projects (CRPs) bring together fusion energy researchers and atomic, molecular and surface physicists for joint workmore » towards the development of new data and new methods. The databases and current CRPs on A+M/PMI processes are briefly described here.« less
Muscle Logic: New Knowledge Resource for Anatomy Enables Comprehensive Searches of the Literature on the Feeding Muscles of Mammals.

PubMed

Druzinsky, Robert E; Balhoff, James P; Crompton, Alfred W; Done, James; German, Rebecca Z; Haendel, Melissa A; Herrel, Anthony; Herring, Susan W; Lapp, Hilmar; Mabee, Paula M; Muller, Hans-Michael; Mungall, Christopher J; Sternberg, Paul W; Van Auken, Kimberly; Vinyard, Christopher J; Williams, Susan H; Wall, Christine E

2016-01-01

In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required. Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.
Intelligent Text Retrieval and Knowledge Acquisition from Texts for NASA Applications: Preprocessing Issues

NASA Technical Reports Server (NTRS)

2002-01-01

A system that retrieves problem reports from a NASA database is described. The database is queried with natural language questions. Part-of-speech tags are first assigned to each word in the question using a rule based tagger. A partial parse of the question is then produced with independent sets of deterministic finite state a utomata. Using partial parse information, a look up strategy searches the database for problem reports relevant to the question. A bigram stemmer and irregular verb conjugates have been incorporated into the system to improve accuracy. The system is evaluated by a set of fifty five questions posed by NASA engineers. A discussion of future research is also presented.
The future of bibliographic standards in a networked information environment

NASA Technical Reports Server (NTRS)

1997-01-01

The main mission of the CENDI Cataloging Working Group is to provide guidelines for cataloging practices that support the sharing of database records among the CENDI agencies, and that incorporate principles based on cost effectiveness and efficiency. Recent efforts include the extension of COSATI Guidelines for the Cataloging of Technical Reports to include non-print materials, and the mapping of each agency's export file structure to USMARC. Of primary importance is the impact of electronic documents and the distributed nature of the networked information environment. Topics discussed during the workshop include the following: Trade-offs in Cataloging and Indexing Internet Information; The Impact on Current and Future Standards; A Look at WWW Metadata Initiatives; Standards for Electronic Journals; The Present and Future Search Engines; The Roles for Text Analysis Software; Advanced Search Engine Meets Metathesaurus; Locator Schemes for Internet Resources; Identifying and Cataloging Web Document Types; In Search of a New Bibliographic Record. The videos in this set include viewgraphs of charts and related materials of the workshop.
ExplorEnz: the primary source of the IUBMB enzyme list

PubMed Central

McDonald, Andrew G.; Boyce, Sinéad; Tipton, Keith F.

2009-01-01

ExplorEnz is the MySQL database that is used for the curation and dissemination of the International Union of Biochemistry and Molecular Biology (IUBMB) Enzyme Nomenclature. A simple web-based query interface is provided, along with an advanced search engine for more complex Boolean queries. The WWW front-end is accessible at http://www.enzyme-database.org, from where downloads of the database as SQL and XML are also available. An associated form-based curatorial application has been developed to facilitate the curation of enzyme data as well as the internal and public review processes that occur before an enzyme entry is made official. Suggestions for new enzyme entries, or modifications to existing ones, can be made using the forms provided at http://www.enzyme-database.org/forms.php. PMID:18776214
Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence database.

PubMed Central

Billoud, B; Kontic, M; Viari, A

1996-01-01

At the DNA/RNA level, biological signals are defined by a combination of spatial structures and sequence motifs. Until now, few attempts had been made in writing general purpose search programs that take into account both sequence and structure criteria. Indeed, the most successful structure scanning programs are usually dedicated to particular structures and are written using general purpose programming languages through a complex and time consuming process where the biological problem of defining the structure and the computer engineering problem of looking for it are intimately intertwined. In this paper, we describe a general representation of structures, suitable for database scanning, together with a programming language, Palingol, designed to manipulate it. Palingol has specific data types, corresponding to structural elements-basically helices-that can be arranged in any way to form a complex structure. As a consequence of the declarative approach used in Palingol, the user should only focus on 'what to search for' while the language engine takes care of 'how to look for it'. Therefore, it becomes simpler to write a scanning program and the structural constraints that define the required structure are more clearly identified. PMID:8628670

iProphet: Multi-level Integrative Analysis of Shotgun Proteomic Data Improves Peptide and Protein Identification Rates and Error Estimates*

PubMed Central

Shteynberg, David; Deutsch, Eric W.; Lam, Henry; Eng, Jimmy K.; Sun, Zhi; Tasman, Natalie; Mendoza, Luis; Moritz, Robert L.; Aebersold, Ruedi; Nesvizhskii, Alexey I.

2011-01-01

The combination of tandem mass spectrometry and sequence database searching is the method of choice for the identification of peptides and the mapping of proteomes. Over the last several years, the volume of data generated in proteomic studies has increased dramatically, which challenges the computational approaches previously developed for these data. Furthermore, a multitude of search engines have been developed that identify different, overlapping subsets of the sample peptides from a particular set of tandem mass spectrometry spectra. We present iProphet, the new addition to the widely used open-source suite of proteomic data analysis tools Trans-Proteomics Pipeline. Applied in tandem with PeptideProphet, it provides more accurate representation of the multilevel nature of shotgun proteomic data. iProphet combines the evidence from multiple identifications of the same peptide sequences across different spectra, experiments, precursor ion charge states, and modified states. It also allows accurate and effective integration of the results from multiple database search engines applied to the same data. The use of iProphet in the Trans-Proteomics Pipeline increases the number of correctly identified peptides at a constant false discovery rate as compared with both PeptideProphet and another state-of-the-art tool Percolator. As the main outcome, iProphet permits the calculation of accurate posterior probabilities and false discovery rate estimates at the level of sequence identical peptide identifications, which in turn leads to more accurate probability estimates at the protein level. Fully integrated with the Trans-Proteomics Pipeline, it supports all commonly used MS instruments, search engines, and computer platforms. The performance of iProphet is demonstrated on two publicly available data sets: data from a human whole cell lysate proteome profiling experiment representative of typical proteomic data sets, and from a set of Streptococcus pyogenes experiments more representative of organism-specific composite data sets. PMID:21876204
Retrieving high-resolution images over the Internet from an anatomical image database

NASA Astrophysics Data System (ADS)

Strupp-Adams, Annette; Henderson, Earl

1999-12-01

The Visible Human Data set is an important contribution to the national collection of anatomical images. To enhance the availability of these images, the National Library of Medicine has supported the design and development of a prototype object-oriented image database which imports, stores, and distributes high resolution anatomical images in both pixel and voxel formats. One of the key database modules is its client-server Internet interface. This Web interface provides a query engine with retrieval access to high-resolution anatomical images that range in size from 100KB for browser viewable rendered images, to 1GB for anatomical structures in voxel file formats. The Web query and retrieval client-server system is composed of applet GUIs, servlets, and RMI application modules which communicate with each other to allow users to query for specific anatomical structures, and retrieve image data as well as associated anatomical images from the database. Selected images can be downloaded individually as single files via HTTP or downloaded in batch-mode over the Internet to the user's machine through an applet that uses Netscape's Object Signing mechanism. The image database uses ObjectDesign's object-oriented DBMS, ObjectStore that has a Java interface. The query and retrieval systems has been tested with a Java-CDE window system, and on the x86 architecture using Windows NT 4.0. This paper describes the Java applet client search engine that queries the database; the Java client module that enables users to view anatomical images online; the Java application server interface to the database which organizes data returned to the user, and its distribution engine that allow users to download image files individually and/or in batch-mode.
Overview of Nuclear Physics Data: Databases, Web Applications and Teaching Tools

NASA Astrophysics Data System (ADS)

McCutchan, Elizabeth

2017-01-01

The mission of the United States Nuclear Data Program (USNDP) is to provide current, accurate, and authoritative data for use in pure and applied areas of nuclear science and engineering. This is accomplished by compiling, evaluating, and disseminating extensive datasets. Our main products include the Evaluated Nuclear Structure File (ENSDF) containing information on nuclear structure and decay properties and the Evaluated Nuclear Data File (ENDF) containing information on neutron-induced reactions. The National Nuclear Data Center (NNDC), through the website www.nndc.bnl.gov, provides web-based retrieval systems for these and many other databases. In addition, the NNDC hosts several on-line physics tools, useful for calculating various quantities relating to basic nuclear physics. In this talk, I will first introduce the quantities which are evaluated and recommended in our databases. I will then outline the searching capabilities which allow one to quickly and efficiently retrieve data. Finally, I will demonstrate how the database searches and web applications can provide effective teaching tools concerning the structure of nuclei and how they interact. Work supported by the Office of Nuclear Physics, Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-98CH10886.
Towards pathogenomics: a web-based resource for pathogenicity islands

PubMed Central

Yoon, Sung Ho; Park, Young-Kyu; Lee, Soohyun; Choi, Doil; Oh, Tae Kwang; Hur, Cheol-Goo; Kim, Jihyun F.

2007-01-01

Pathogenicity islands (PAIs) are genetic elements whose products are essential to the process of disease development. They have been horizontally (laterally) transferred from other microbes and are important in evolution of pathogenesis. In this study, a comprehensive database and search engines specialized for PAIs were established. The pathogenicity island database (PAIDB) is a comprehensive relational database of all the reported PAIs and potential PAI regions which were predicted by a method that combines feature-based analysis and similarity-based analysis. Also, using the PAI Finder search application, a multi-sequence query can be analyzed onsite for the presence of potential PAIs. As of April 2006, PAIDB contains 112 types of PAIs and 889 GenBank accessions containing either partial or all PAI loci previously reported in the literature, which are present in 497 strains of pathogenic bacteria. The database also offers 310 candidate PAIs predicted from 118 sequenced prokaryotic genomes. With the increasing number of prokaryotic genomes without functional inference and sequenced genetic regions of suspected involvement in diseases, this web-based, user-friendly resource has the potential to be of significant use in pathogenomics. PAIDB is freely accessible at . PMID:17090594
A Database for Propagation Models and Conversion to C++ Programming Language

NASA Technical Reports Server (NTRS)

Kantak, Anil V.; Angkasa, Krisjani; Rucker, James

1996-01-01

The telecommunications system design engineer generally needs the quantification of effects of the propagation medium (definition of the propagation channel) to design an optimal communications system. To obtain the definition of the channel, the systems engineer generally has a few choices. A search of the relevant publications such as the IEEE Transactions, CCIR's, NASA propagation handbook, etc., may be conducted to find the desired channel values. This method may need excessive amounts of time and effort on the systems engineer's part and there is a possibility that the search may not even yield the needed results. To help the researcher and the systems engineers, it was recommended by the conference participants of NASA Propagation Experimenters (NAPEX) XV (London, Ontario, Canada, June 28 and 29, 1991) that a software should be produced that would contain propagation models and the necessary prediction methods of most propagation phenomena. Moreover, the software should be flexible enough for the user to make slight changes to the models without expending a substantial effort in programming. In the past few years, a software was produced to fit these requirements as best as could be done. The software was distributed to all NAPEX participants for evaluation and use, the participant reactions, suggestions etc., were gathered and were used to improve the subsequent releases of the software. The existing database program is in the Microsoft Excel application software and works fine within the guidelines of that environment, however, recently there have been some questions about the robustness and survivability of the Excel software in the ever changing (hopefully improving) world of software packages.
The potential of tissue engineering for developing alternatives to animal experiments: a systematic review.

PubMed

de Vries, Rob B M; Leenaars, Marlies; Tra, Joppe; Huijbregtse, Robbertjan; Bongers, Erik; Jansen, John A; Gordijn, Bert; Ritskes-Hoitinga, Merel

2015-07-01

An underexposed ethical issue raised by tissue engineering is the use of laboratory animals in tissue engineering research. Even though this research results in suffering and loss of life in animals, tissue engineering also has great potential for the development of alternatives to animal experiments. With the objective of promoting a joint effort of tissue engineers and alternative experts to fully realise this potential, this study provides the first comprehensive overview of the possibilities of using tissue-engineered constructs as a replacement of laboratory animals. Through searches in two large biomedical databases (PubMed, Embase) and several specialised 3R databases, 244 relevant primary scientific articles, published between 1991 and 2011, were identified. By far most articles reviewed related to the use of tissue-engineered skin/epidermis for toxicological applications such as testing for skin irritation. This review article demonstrates, however, that the potential for the development of alternatives also extends to other tissues such as other epithelia and the liver, as well as to other fields of application such as drug screening and basic physiology. This review discusses which impediments need to be overcome to maximise the contributions that the field of tissue engineering can make, through the development of alternative methods, to the reduction of the use and suffering of laboratory animals. Copyright © 2013 John Wiley & Sons, Ltd.
PIRIA: a general tool for indexing, search, and retrieval of multimedia content

NASA Astrophysics Data System (ADS)

Joint, Magali; Moellic, Pierre-Alain; Hede, P.; Adam, P.

2004-05-01

The Internet is a continuously expanding source of multimedia content and information. There are many products in development to search, retrieve, and understand multimedia content. But most of the current image search/retrieval engines, rely on a image database manually pre-indexed with keywords. Computers are still powerless to understand the semantic meaning of still or animated image content. Piria (Program for the Indexing and Research of Images by Affinity), the search engine we have developed brings this possibility closer to reality. Piria is a novel search engine that uses the query by example method. A user query is submitted to the system, which then returns a list of images ranked by similarity, obtained by a metric distance that operates on every indexed image signature. These indexed images are compared according to several different classifiers, not only Keywords, but also Form, Color and Texture, taking into account geometric transformations and variance like rotation, symmetry, mirroring, etc. Form - Edges extracted by an efficient segmentation algorithm. Color - Histogram, semantic color segmentation and spatial color relationship. Texture - Texture wavelets and local edge patterns. If required, Piria is also able to fuse results from multiple classifiers with a new classification of index categories: Single Indexer Single Call (SISC), Single Indexer Multiple Call (SIMC), Multiple Indexers Single Call (MISC) or Multiple Indexers Multiple Call (MIMC). Commercial and industrial applications will be explored and discussed as well as current and future development.
MASCOT HTML and XML parser: an implementation of a novel object model for protein identification data.

PubMed

Yang, Chunguang G; Granite, Stephen J; Van Eyk, Jennifer E; Winslow, Raimond L

2006-11-01

Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http://www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.
Pub-Med-dot-com, here we come!

PubMed

Pulst, Stefan M

2016-08-01

As of April 8, 2016, articles in Neurology® Genetics can be searched using PubMed. Launched in 1996, PubMed is a search engine that accesses citations and abstracts of more than 26 million articles. Its primary sources include the MEDLINE database, which was started in the 1960s, and biomedical and life sciences journal articles that date back to 1946. In addition, PubMed accesses other sources, for example, citations to those life sciences journals that submit full-text articles to PubMed Central (PMC). PubMed Central was launched in 2000 as a free archive of biomedical and life science journals.
Consolidating Russia and Eurasia Antibiotic Resistance Data for 1992-2014 Using Search Engine.

PubMed

Bedenkov, Alexander; Shpinev, Vitaly; Suvorov, Nikolay; Sokolov, Evgeny; Riabenko, Evgeniy

2016-01-01

The World Health Organization recognizes the antibiotic resistance problem as a major health threat in the twenty first century. The paper describes an effort to fight it undertaken at the verge of two industries-healthcare and Data Science. One of the major difficulties in monitoring antibiotic resistance is low availability of comprehensive research data. Our aim is to develop a nation-wide antibiotic resistance database using Internet search and data processing algorithms using Russian language publications. An interdisciplinary team built an intelligent Internet search filter to locate all publicly available research data on antibiotic resistance in Russia and Eurasia countries, extracted it, and collated it for analysis. A database was constructed using data from 850 original studies conducted at 153 locations in 12 countries between 1992 and 2014. The studies contained susceptibility and resistance rates of 156 microorganisms to 157 antibiotic drugs. The applied search methodology was highly robust in that it yielded search precision of 58 vs. 20% in a typical Internet search. It allowed finding and collating within the database the following data items (among many others): publication details including title, source, date, authors, etc.; study details: time period, locations, research organization, therapy area, etc.; microorganisms and antibiotic drugs included in the study along with prevalence values of resistant and susceptible strains, and numbers of isolates. The next stage in project development will try to validate the data by matching it to major benchmark studies; in addition, a panel of experts will be convened to evaluate the outcomes. The work provides a supplementary tool to national surveillance systems in antibiotic resistance, and consolidates fragmented research data available for 12 countries for a period of more than 20 years.
Consolidating Russia and Eurasia Antibiotic Resistance Data for 1992–2014 Using Search Engine

PubMed Central

Bedenkov, Alexander; Shpinev, Vitaly; Suvorov, Nikolay; Sokolov, Evgeny; Riabenko, Evgeniy

2016-01-01

Background: The World Health Organization recognizes the antibiotic resistance problem as a major health threat in the twenty first century. The paper describes an effort to fight it undertaken at the verge of two industries—healthcare and Data Science. One of the major difficulties in monitoring antibiotic resistance is low availability of comprehensive research data. Our aim is to develop a nation-wide antibiotic resistance database using Internet search and data processing algorithms using Russian language publications. Materials and Methods: An interdisciplinary team built an intelligent Internet search filter to locate all publicly available research data on antibiotic resistance in Russia and Eurasia countries, extracted it, and collated it for analysis. A database was constructed using data from 850 original studies conducted at 153 locations in 12 countries between 1992 and 2014. The studies contained susceptibility and resistance rates of 156 microorganisms to 157 antibiotic drugs. Results: The applied search methodology was highly robust in that it yielded search precision of 58 vs. 20% in a typical Internet search. It allowed finding and collating within the database the following data items (among many others): publication details including title, source, date, authors, etc.; study details: time period, locations, research organization, therapy area, etc.; microorganisms and antibiotic drugs included in the study along with prevalence values of resistant and susceptible strains, and numbers of isolates. The next stage in project development will try to validate the data by matching it to major benchmark studies; in addition, a panel of experts will be convened to evaluate the outcomes. Conclusions: The work provides a supplementary tool to national surveillance systems in antibiotic resistance, and consolidates fragmented research data available for 12 countries for a period of more than 20 years. PMID:27014217
Role of varicocele treatment in assisted reproductive technologies.

PubMed

Sönmez, Mehmet G; Haliloğlu, Ahmet H

2018-03-01

In this review, we investigate the advantage of varicocele repair prior to assisted reproductive technologies (ART) for infertile couples and provide cost analysis information. We searched the following electronic databases: PubMed, Medline, Excerpta Medica Database (Embase), Cumulative Index to Nursing and Allied Health Literature (CINAHL). The following search strategy was modified for the various databases and search engines: 'varicocele', 'varicocelectomy', 'varicocele repair', 'ART', ' in vitro fertilisation (IVF)', 'intracytoplasmic sperm injection (ICSI)'. A total of 49 articles, including six meta-analyses, 32 systematic reviews, and 11 original articles, were included in the analysis. Bypassing potentially reversible male subfertility factors using ART is currently common practice. However, varicocele may be present in 35% of men with primary infertility and 80% of men with secondary infertility. Varicocele repair has been shown to be an effective treatment for infertile men with clinical varicocele, thus should play an important role in the treatment of such patients due to the foetal/genetic risks and high costs that are associated with increased ART use. Varicocele repair is a cost-effective treatment method that can improve semen parameters, pregnancy rates, and live-birth rates in most infertile men with clinical varicocele. By improving semen parameters and sperm structure, varicocele repair can decrease or even eliminate ART requirement.
Different methods of dentin processing for application in bone tissue engineering: A systematic review.

PubMed

Tabatabaei, Fahimeh Sadat; Tatari, Saeed; Samadi, Ramin; Moharamzadeh, Keyvan

2016-10-01

Dentin has become an interesting potential biomaterial for tissue engineering of oral hard tissues. It can be used as a scaffold or as a source of growth factors in bone tissue engineering. Different forms of dentin have been studied for their potential use as bone substitutes. Here, we systematically review different methods of dentin preparation and the efficacy of processed dentin in bone tissue engineering. An electronic search was carried out in PubMed and Scopus databases for articles published from 2000 to 2016. Studies on dentin preparation for application in bone tissue engineering were selected. The initial search yielded a total of 1045 articles, of which 37 were finally selected. Review of studies showed that demineralization was the most commonly used dentin preparation process for use in tissue engineering. Dentin extract, dentin particles (tooth ash), freeze-dried dentin, and denatured dentin are others method of dentin preparation. Based on our literature review, we can conclude that preparation procedure and the size and shape of dentin particles play an important role in its osteoinductive and osteoconductive properties. Standardization of these methods is important to draw a conclusion in this regard. © 2016 Wiley Periodicals, Inc. J Biomed Mater Res Part A: 104A: 2616-2627, 2016. © 2016 Wiley Periodicals, Inc.
Textual appropriation in engineering master's theses: a preliminary study.

PubMed

Eckel, Edward J

2011-09-01

In the thesis literature review, an engineering graduate student is expected to place original research in the context of previous work by other researchers. However, for some students, particularly those for whom English is a second language, the literature review may be a mixture of original writing and verbatim source text appropriated without quotations. Such problematic use of source material leaves students vulnerable to an accusation of plagiarism, which carries severe consequences. Is such textual appropriation common in engineering master's writing? Furthermore, what, if anything, can be concluded when two texts have been found to have textual material in common? Do existing definitions of plagiarism provide a sufficient framework for determining if an instance of copying is transgressive or not? In a preliminary attempt to answer these questions, text strings from a random sample of 100 engineering master's theses from the ProQuest Dissertations and Theses database were searched for appropriated verbatim source text using the Google search engine. The results suggest that textual borrowing may indeed be a common feature of the master's engineering literature review, raising questions about the ability of graduate students to synthesize the literature. The study also illustrates the difficulties of making a determination of plagiarism based on simple textual similarity. A context-specific approach is recommended when dealing with any instance of apparent copying.
A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*

PubMed Central

Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing

2011-01-01

Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108
Web-based UMLS concept retrieval by automatic text scanning: a comparison of two methods.

PubMed

Brandt, C; Nadkarni, P

2001-01-01

The Web is increasingly the medium of choice for multi-user application program delivery. Yet selection of an appropriate programming environment for rapid prototyping, code portability, and maintainability remain issues. We summarize our experience on the conversion of a LISP Web application, Search/SR to a new, functionally identical application, Search/SR-ASP using a relational database and active server pages (ASP) technology. Our results indicate that provision of easy access to database engines and external objects is almost essential for a development environment to be considered viable for rapid and robust application delivery. While LISP itself is a robust language, its use in Web applications may be hard to justify given that current vendor implementations do not provide such functionality. Alternative, currently available scripting environments for Web development appear to have most of LISP's advantages and few of its disadvantages.
Filtering Access to Internet Content at Higher Education Institutions: Stakeholder Perceptions and Their Impact on Research and Academic Freedom

ERIC Educational Resources Information Center

Orenstein, David I.

2009-01-01

Hardware and software filters, which sift through keywords placed in Internet search engines and online databases, work to limit the return of information from these sources. By their very purpose, filters exist to decrease the amount of information researchers can access. The purpose of this study is to gain insight into the perceptions key…
Environmental and Molecular Science Laboratory Arrow

DOE Office of Scientific and Technical Information (OSTI.GOV)

2016-06-24

Arrows is a software package that combines NWChem, SQL and NOSQL databases, email, and social networks (e.g. Twitter, Tumblr) that simplifies molecular and materials modeling and makes these modeling capabilities accessible to all scientists and engineers. EMSL Arrows is very simple to use. The user just emails chemical reactions to arrows@emsl.pnnl.gov and then an email is sent back with thermodynamic, reaction pathway (kinetic), spectroscopy, and other results. EMSL Arrows parses the email and then searches the database for the compounds in the reactions. If a compound isn't there, an NWChem calculation is setup and submitted to calculate it. Once themore » calculation is finished the results are entered into the database and then results are emailed back.« less
A review of parameters and heuristics for guiding metabolic pathfinding.

PubMed

Kim, Sarah M; Peña, Matthew I; Moll, Mark; Bennett, George N; Kavraki, Lydia E

2017-09-15

Recent developments in metabolic engineering have led to the successful biosynthesis of valuable products, such as the precursor of the antimalarial compound, artemisinin, and opioid precursor, thebaine. Synthesizing these traditionally plant-derived compounds in genetically modified yeast cells introduces the possibility of significantly reducing the total time and resources required for their production, and in turn, allows these valuable compounds to become cheaper and more readily available. Most biosynthesis pathways used in metabolic engineering applications have been discovered manually, requiring a tedious search of existing literature and metabolic databases. However, the recent rapid development of available metabolic information has enabled the development of automated approaches for identifying novel pathways. Computer-assisted pathfinding has the potential to save biochemists time in the initial discovery steps of metabolic engineering. In this paper, we review the parameters and heuristics used to guide the search in recent pathfinding algorithms. These parameters and heuristics capture information on the metabolic network structure, compound structures, reaction features, and organism-specificity of pathways. No one metabolic pathfinding algorithm or search parameter stands out as the best to use broadly for solving the pathfinding problem, as each method and parameter has its own strengths and shortcomings. As assisted pathfinding approaches continue to become more sophisticated, the development of better methods for visualizing pathway results and integrating these results into existing metabolic engineering practices is also important for encouraging wider use of these pathfinding methods.
Raising the IQ in full-text searching via intelligent querying

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kero, R.; Russell, L.; Swietlik, C.

1994-11-01

Current Information Retrieval (IR) technologies allow for efficient access to relevant information, provided that user selected query terms coincide with the specific linguistical choices made by the authors whose works constitute the text-base. Therefore, the challenge is to enhance the limited searching capability of state-of-the-practice IR. This can be done either with augmented clients that overcome current server searching deficiencies, or with added capabilities that can augment searching algorithms on the servers. The technology being investigated is that of deductive databases, with a set of new techniques called cooperative answering. This technology utilizes semantic networks to allow for navigation betweenmore » possible query search term alternatives. The augmented search terms are passed to an IR engine and the results can be compared. The project utilizes the OSTI Environment, Safety and Health Thesaurus to populate the domain specific semantic network and the text base of ES&H related documents from the Facility Profile Information Management System as the domain specific search space.« less

Development and operation of NEW-KOTIS : In-house technical information database of Nippon Kokan Corp.

NASA Astrophysics Data System (ADS)

Yagi, Yukio; Takahashi, Kaei

The purpose of this report is to describe how the activities for managing technical information has been and is now being conducted by the Engineering department of Nippon Kokan Corp. In addition, as a practical example of database generation promoted by the department, this book gives whole aspects of the NEW-KOTIS (background of its development, history, features, functional details, control and operation method, use in search operations, and so forth). The NEW-KOTIS (3rd-term system) is an "in-house technical information database system," which started its operation on May, 1987. This database system now contains approximately 65,000 information items (research reports, investigation reports, technical reports, etc.) generated within the company, and this information is available to anyone in any department through the network connecting all the company's structures.
Does an English appeal court ruling increase the risks of miscarriages of justice when complex DNA profiles are searched against the national DNA database?

PubMed

Gill, P; Bleka, Ø; Egeland, T

2014-11-01

Likelihood ratio (LR) methods to interpret multi-contributor, low template, complex DNA mixtures are becoming standard practice. The next major development will be to introduce search engines based on the new methods to interrogate very large national DNA databases, such as those held by China, the USA and the UK. Here we describe a rapid method that was used to assign a LR to each individual member of database of 5 million genotypes which can be ranked in order. Previous authors have only considered database trawls in the context of binary match or non-match criteria. However, the concept of match/non-match no longer applies within the new paradigm introduced, since the distribution of resultant LRs is continuous for practical purposes. An English appeal court decision allows scientists to routinely report complex DNA profiles using nothing more than their subjective personal 'experience of casework' and 'observations' in order to apply an expression of the rarity of an evidential sample. This ruling must be considered in context of a recent high profile English case, where an individual was extracted from a database and wrongly accused of a serious crime. In this case the DNA evidence was used to negate the overwhelming exculpatory (non-DNA) evidence. Demonstrable confirmation bias, also known as the 'CSI-effect, seriously affected the investigation. The case demonstrated that in practice, databases could be used to select and prosecute an individual, simply because he ranked high in the list of possible matches. We have identified this phenomenon as a cognitive error which we term: 'the naïve investigator effect'. We take the opportunity to test the performance of database extraction strategies either by using a simple matching allele count (MAC) method or LR. The example heard by the appeal court is used as the exemplar case. It is demonstrated that the LR search-method offers substantial benefits compared to searches based on simple matching allele count (MAC) methods. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Captagon: use and trade in the Middle East.

PubMed

Al-Imam, Ahmed; Santacroce, Rita; Roman-Urrestarazu, Andres; Chilcott, Robert; Bersani, Giuseppe; Martinotti, Giovanni; Corazza, Ornella

2017-05-01

Fenetheylline, a psychostimulant drug, often branded as Captagon, is a combination of amphetamine and theophylline. Since the cessation of its legal production in 1986, counterfeited products have been produced illicitly in south-east Europe and far-east Asia. Its profitable trade has been linked to terrorist organizations, including Islamic State of Iraq and the Levant. This study aims to reach up-to-date data, concerning the Captagon e-commerce and use in the Middle East. A multi-staged and multi-lingual literature search was carried out. A list of prespecified keywords was applied across medical and paramedical databases, web and Dark web, search engines, social communication media, electronic commerce websites, media networks, and the Global Public Health Intelligence Network database. The use of Captagon as a stimulant in terrorist settings has been marginally covered in the literature. Data can widely be retrieved from Google and AOL search engines, YouTube, and Amazon e-commerce websites, and to a lesser extent from Alibaba and eBay. On the contrary, Middle Eastern e-commerce websites yielded almost no results. Interestingly, the Dark web generated original data for Captagon e-commerce in the Middle East. Further investigations are needed on the role that psychoactive drugs play in terrorist attacks and civil war zones. Unless a comprehensive methodological strategy, inclusive of unconventional methods of research, is implemented, it will not be feasible to face such a threat to humanity. Copyright © 2016 John Wiley & Sons, Ltd.
Database Resources of the BIG Data Center in 2018

PubMed Central

Xu, Xingjian; Hao, Lili; Zhu, Junwei; Tang, Bixia; Zhou, Qing; Song, Fuhai; Chen, Tingting; Zhang, Sisi; Dong, Lili; Lan, Li; Wang, Yanqing; Sang, Jian; Hao, Lili; Liang, Fang; Cao, Jiabao; Liu, Fang; Liu, Lin; Wang, Fan; Ma, Yingke; Xu, Xingjian; Zhang, Lijuan; Chen, Meili; Tian, Dongmei; Li, Cuiping; Dong, Lili; Du, Zhenglin; Yuan, Na; Zeng, Jingyao; Zhang, Zhewen; Wang, Jinyue; Shi, Shuo; Zhang, Yadong; Pan, Mengyu; Tang, Bixia; Zou, Dong; Song, Shuhui; Sang, Jian; Xia, Lin; Wang, Zhennan; Li, Man; Cao, Jiabao; Niu, Guangyi; Zhang, Yang; Sheng, Xin; Lu, Mingming; Wang, Qi; Xiao, Jingfa; Zou, Dong; Wang, Fan; Hao, Lili; Liang, Fang; Li, Mengwei; Sun, Shixiang; Zou, Dong; Li, Rujiao; Yu, Chunlei; Wang, Guangyu; Sang, Jian; Liu, Lin; Li, Mengwei; Li, Man; Niu, Guangyi; Cao, Jiabao; Sun, Shixiang; Xia, Lin; Yin, Hongyan; Zou, Dong; Xu, Xingjian; Ma, Lina; Chen, Huanxin; Sun, Yubin; Yu, Lei; Zhai, Shuang; Sun, Mingyuan; Zhang, Zhang; Zhao, Wenming; Xiao, Jingfa; Bao, Yiming; Song, Shuhui; Hao, Lili; Li, Rujiao; Ma, Lina; Sang, Jian; Wang, Yanqing; Tang, Bixia; Zou, Dong; Wang, Fan

2018-01-01

Abstract The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. PMID:29036542
Literature searches on Ayurveda: An update.

PubMed

Aggithaya, Madhur G; Narahari, Saravu R

2015-01-01

The journals that publish on Ayurveda are increasingly indexed by popular medical databases in recent years. However, many Eastern journals are not indexed biomedical journal databases such as PubMed. Literature searches for Ayurveda continue to be challenging due to the nonavailability of active, unbiased dedicated databases for Ayurvedic literature. In 2010, authors identified 46 databases that can be used for systematic search of Ayurvedic papers and theses. This update reviewed our previous recommendation and identified current and relevant databases. To update on Ayurveda literature search and strategy to retrieve maximum publications. Author used psoriasis as an example to search previously listed databases and identify new. The population, intervention, control, and outcome table included keywords related to psoriasis and Ayurvedic terminologies for skin diseases. Current citation update status, search results, and search options of previous databases were assessed. Eight search strategies were developed. Hundred and five journals, both biomedical and Ayurveda, which publish on Ayurveda, were identified. Variability in databases was explored to identify bias in journal citation. Five among 46 databases are now relevant - AYUSH research portal, Annotated Bibliography of Indian Medicine, Digital Helpline for Ayurveda Research Articles (DHARA), PubMed, and Directory of Open Access Journals. Search options in these databases are not uniform, and only PubMed allows complex search strategy. "The Researches in Ayurveda" and "Ayurvedic Research Database" (ARD) are important grey resources for hand searching. About 44/105 (41.5%) journals publishing Ayurvedic studies are not indexed in any database. Only 11/105 (10.4%) exclusive Ayurveda journals are indexed in PubMed. AYUSH research portal and DHARA are two major portals after 2010. It is mandatory to search PubMed and four other databases because all five carry citations from different groups of journals. The hand searching is important to identify Ayurveda publications that are not indexed elsewhere. Availability information of citations in Ayurveda libraries from National Union Catalogue of Scientific Serials in India if regularly updated will improve the efficacy of hand searching. A grey database (ARD) contains unpublished PG/Ph.D. theses. The AYUSH portal, DHARA (funded by Ministry of AYUSH), and ARD should be merged to form single larger database to limit Ayurveda literature searches.
Geoscience information integration and visualization research of Shandong Province, China based on ArcGIS engine

NASA Astrophysics Data System (ADS)

Xu, Mingzhu; Gao, Zhiqiang; Ning, Jicai

2014-10-01

To improve the access efficiency of geoscience data, efficient data model and storage solutions should be used. Geoscience data is usually classified by format or coordinate system in existing storage solutions. When data is large, it is not conducive to search the geographic features. In this study, a geographical information integration system of Shandong province, China was developed based on the technology of ArcGIS Engine, .NET, and SQL Server. It uses Geodatabase spatial data model and ArcSDE to organize and store spatial and attribute data and establishes geoscience database of Shangdong. Seven function modules were designed: map browse, database and subject management, layer control, map query, spatial analysis and map symbolization. The system's characteristics of can be browsed and managed by geoscience subjects make the system convenient for geographic researchers and decision-making departments to use the data.
Enabling Searches on Wavelengths in a Hyperspectral Indices Database

NASA Astrophysics Data System (ADS)

Piñuela, F.; Cerra, D.; Müller, R.

2017-10-01

Spectral indices derived from hyperspectral reflectance measurements are powerful tools to estimate physical parameters in a non-destructive and precise way for several fields of applications, among others vegetation health analysis, coastal and deep water constituents, geology, and atmosphere composition. In the last years, several micro-hyperspectral sensors have appeared, with both full-frame and push-broom acquisition technologies, while in the near future several hyperspectral spaceborne missions are planned to be launched. This is fostering the use of hyperspectral data in basic and applied research causing a large number of spectral indices to be defined and used in various applications. Ad hoc search engines are therefore needed to retrieve the most appropriate indices for a given application. In traditional systems, query input parameters are limited to alphanumeric strings, while characteristics such as spectral range/ bandwidth are not used in any existing search engine. Such information would be relevant, as it enables an inverse type of search: given the spectral capabilities of a given sensor or a specific spectral band, find all indices which can be derived from it. This paper describes a tool which enables a search as described above, by using the central wavelength or spectral range used by a given index as a search parameter. This offers the ability to manage numeric wavelength ranges in order to select indices which work at best in a given set of wavelengths or wavelength ranges.
Muscle Logic: New Knowledge Resource for Anatomy Enables Comprehensive Searches of the Literature on the Feeding Muscles of Mammals

PubMed Central

Druzinsky, Robert E.; Balhoff, James P.; Crompton, Alfred W.; Done, James; German, Rebecca Z.; Haendel, Melissa A.; Herrel, Anthony; Herring, Susan W.; Lapp, Hilmar; Mabee, Paula M.; Muller, Hans-Michael; Mungall, Christopher J.; Sternberg, Paul W.; Van Auken, Kimberly; Vinyard, Christopher J.; Williams, Susan H.; Wall, Christine E.

2016-01-01

Background In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required. Development and Testing of the Ontology Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. Results and Significance Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities. PMID:26870952
The Impact of Online Bibliographic Databases on Teaching and Research in Political Science.

ERIC Educational Resources Information Center

Reichel, Mary

The availability of online bibliographic databases greatly facilitates literature searching in political science. The advantages to searching databases online include combination of concepts, comprehensiveness, multiple database searching, free-text searching, currency, current awareness services, document delivery service, and convenience.…
Marine Planning and Service Platform: specific ontology based semantic search engine serving data management and sustainable development

NASA Astrophysics Data System (ADS)

Manzella, Giuseppe M. R.; Bartolini, Andrea; Bustaffa, Franco; D'Angelo, Paolo; De Mattei, Maurizio; Frontini, Francesca; Maltese, Maurizio; Medone, Daniele; Monachini, Monica; Novellino, Antonio; Spada, Andrea

2016-04-01

The MAPS (Marine Planning and Service Platform) project is aiming at building a computer platform supporting a Marine Information and Knowledge System. One of the main objective of the project is to develop a repository that should gather, classify and structure marine scientific literature and data thus guaranteeing their accessibility to researchers and institutions by means of standard protocols. In oceanography the cost related to data collection is very high and the new paradigm is based on the concept to collect once and re-use many times (for re-analysis, marine environment assessment, studies on trends, etc). This concept requires the access to quality controlled data and to information that is provided in reports (grey literature) and/or in relevant scientific literature. Hence, creation of new technology is needed by integrating several disciplines such as data management, information systems, knowledge management. In one of the most important EC projects on data management, namely SeaDataNet (www.seadatanet.org), an initial example of knowledge management is provided through the Common Data Index, that is providing links to data and (eventually) to papers. There are efforts to develop search engines to find author's contributions to scientific literature or publications. This implies the use of persistent identifiers (such as DOI), as is done in ORCID. However very few efforts are dedicated to link publications to the data cited or used or that can be of importance for the published studies. This is the objective of MAPS. Full-text technologies are often unsuccessful since they assume the presence of specific keywords in the text; in order to fix this problem, the MAPS project suggests to use different semantic technologies for retrieving the text and data and thus getting much more complying results. The main parts of our design of the search engine are: • Syntactic parser - This module is responsible for the extraction of "rich words" from the text: the whole document gets parsed to extract the words which are more meaningful for the main argument of the document, and applies the extraction in the form of N-grams (mono-grams, bi-grams, tri-grams). • MAPS database - This module is a simple database which contains all the N-grams used by MAPS (physical parameters from SeaDataNet vocabularies) to define our marine "ontology". • Relation identifier - This module performs the most important task of identifying relationships between the N-gram extracted from the text by the parser and the provided oceanographic terminology. It checks N-grams supplied by the Syntactic parser and then matches them with the terms stored in the MAPS database. Found matches are returned back to the parser with flexed form appearing in the source text. • A "relaxed" extractor - This option can be activated when the search engine is launched. It was introduced to give the user a chance to create new N-grams combining existing mono-grams and bi-grams in the database with rich-words found within the source text. The innovation of a semantic engine lies in the fact that the process is not just about the retrieval of already known documents by means of a simple term query but rather the retrieval of a population of documents whose existence was unknown. The system answers by showing a screenshot of results ordered according to the following criteria: • Relevance - of the document with respect to the concept that is searched • Date - of publication of the paper • Source - data provider as defined in the SeaDataNet Common Data Index • Matrix - environmental matrices as defined in the oceanographic field • Geographic area - area specified in the text • Clustering - the process of organizing objects into groups whose members are similar The clustering returns as the output the related documents. For each document the MAPS visualization provides: • Title, author, source/provider of data, web address • Tagging of key terms or concepts • Summary of the document • Visualization of the whole document The possibility of inserting the number of citations for each document among the criteria of the advanced search is currently undergoing; in this case the engine should be able to connect to any of the existing bibliographic citation systems (such as Google Scholar, Scopus, etc.).
Semantics-Based Intelligent Indexing and Retrieval of Digital Images - A Case Study

NASA Astrophysics Data System (ADS)

Osman, Taha; Thakker, Dhavalkumar; Schaefer, Gerald

The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they typically rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this chapter we present a semantically enabled image annotation and retrieval engine that is designed to satisfy the requirements of commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as presenting our initial thoughts on exploiting lexical databases for explicit semantic-based query expansion.
A Method for Search Engine Selection using Thesaurus for Selective Meta-Search Engine

NASA Astrophysics Data System (ADS)

Goto, Shoji; Ozono, Tadachika; Shintani, Toramatsu

In this paper, we propose a new method for selecting search engines on WWW for selective meta-search engine. In selective meta-search engine, a method is needed that would enable selecting appropriate search engines for users' queries. Most existing methods use statistical data such as document frequency. These methods may select inappropriate search engines if a query contains polysemous words. In this paper, we describe an search engine selection method based on thesaurus. In our method, a thesaurus is constructed from documents in a search engine and is used as a source description of the search engine. The form of a particular thesaurus depends on the documents used for its construction. Our method enables search engine selection by considering relationship between terms and overcomes the problems caused by polysemous words. Further, our method does not have a centralized broker maintaining data, such as document frequency for all search engines. As a result, it is easy to add a new search engine, and meta-search engines become more scalable with our method compared to other existing methods.
A Review of Evidence-Based Traffic Engineering Measures Designed to Reduce Pedestrian–Motor Vehicle Crashes

PubMed Central

Retting, Richard A.; Ferguson, Susan A.; McCartt, Anne T.

2003-01-01

We provide a brief critical review and assessment of engineering modifications to the built environment that can reduce the risk of pedestrian injuries. In our review, we used the Transportation Research Information Services database to conduct a search for studies on engineering countermeasures documented in the scientific literature. We classified countermeasures into 3 categories—speed control, separation of pedestrians from vehicles, and measures that increase the visibility and conspicuity of pedestrians. We determined the measures and settings with the greatest potential for crash prevention. Our review, which emphasized inclusion of studies with adequate methodological designs, showed that modification of the built environment can substantially reduce the risk of pedestrian–vehicle crashes. PMID:12948963
Proteome of Caulobacter crescentus cell cycle publicly accessible on SWICZ server.

PubMed

Vohradsky, Jiri; Janda, Ivan; Grünenfelder, Björn; Berndt, Peter; Röder, Daniel; Langen, Hanno; Weiser, Jaroslav; Jenal, Urs

2003-10-01

Here we present the Swiss-Czech Proteomics Server (SWICZ), which hosts the proteomic database summarizing information about the cell cycle of the aquatic bacterium Caulobacter crescentus. The database provides a searchable tool for easy access of global protein synthesis and protein stability data as examined during the C. crescentus cell cycle. Protein synthesis data collected from five different cell cycle stages were determined for each protein spot as a relative value of the total amount of [(35)S]methionine incorporation. Protein stability of pulse-labeled extracts were measured during a chase period equivalent to one cell cycle unit. Quantitative information for individual proteins together with descriptive data such as protein identities, apparent molecular masses and isoelectric points, were combined with information on protein function, genomic context, and the cell cycle stage, and were then assembled in a relational database with a world wide web interface (http://proteom.biomed.cas.cz), which allows the database records to be searched and displays the recovered information. A total of 1250 protein spots were reproducibly detected on two-dimensional gel electropherograms, 295 of which were identified by mass spectroscopy. The database is accessible either through clickable two-dimensional gel electrophoretic maps or by means of a set of dedicated search engines. Basic characterization of the experimental procedures, data processing, and a comprehensive description of the web site are presented. In its current state, the SWICZ proteome database provides a platform for the incorporation of new data emerging from extended functional studies on the C. crescentus proteome.
Alignment of high-throughput sequencing data inside in-memory databases.

PubMed

Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

2014-01-01

In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.
Web Feet Guide to Search Engines: Finding It on the Net.

ERIC Educational Resources Information Center

Web Feet, 2001

2001-01-01

This guide to search engines for the World Wide Web discusses selecting the right search engine; interpreting search results; major search engines; online tutorials and guides; search engines for kids; specialized search tools for various subjects; and other specialized engines and gateways. (LRW)
Efficient hemodynamic event detection utilizing relational databases and wavelet analysis

NASA Technical Reports Server (NTRS)

Saeed, M.; Mark, R. G.

2001-01-01

Development of a temporal query framework for time-oriented medical databases has hitherto been a challenging problem. We describe a novel method for the detection of hemodynamic events in multiparameter trends utilizing wavelet coefficients in a MySQL relational database. Storage of the wavelet coefficients allowed for a compact representation of the trends, and provided robust descriptors for the dynamics of the parameter time series. A data model was developed to allow for simplified queries along several dimensions and time scales. Of particular importance, the data model and wavelet framework allowed for queries to be processed with minimal table-join operations. A web-based search engine was developed to allow for user-defined queries. Typical queries required between 0.01 and 0.02 seconds, with at least two orders of magnitude improvement in speed over conventional queries. This powerful and innovative structure will facilitate research on large-scale time-oriented medical databases.
Literature searches on Ayurveda: An update

PubMed Central

Aggithaya, Madhur G.; Narahari, Saravu R.

2015-01-01

Introduction: The journals that publish on Ayurveda are increasingly indexed by popular medical databases in recent years. However, many Eastern journals are not indexed biomedical journal databases such as PubMed. Literature searches for Ayurveda continue to be challenging due to the nonavailability of active, unbiased dedicated databases for Ayurvedic literature. In 2010, authors identified 46 databases that can be used for systematic search of Ayurvedic papers and theses. This update reviewed our previous recommendation and identified current and relevant databases. Aims: To update on Ayurveda literature search and strategy to retrieve maximum publications. Methods: Author used psoriasis as an example to search previously listed databases and identify new. The population, intervention, control, and outcome table included keywords related to psoriasis and Ayurvedic terminologies for skin diseases. Current citation update status, search results, and search options of previous databases were assessed. Eight search strategies were developed. Hundred and five journals, both biomedical and Ayurveda, which publish on Ayurveda, were identified. Variability in databases was explored to identify bias in journal citation. Results: Five among 46 databases are now relevant – AYUSH research portal, Annotated Bibliography of Indian Medicine, Digital Helpline for Ayurveda Research Articles (DHARA), PubMed, and Directory of Open Access Journals. Search options in these databases are not uniform, and only PubMed allows complex search strategy. “The Researches in Ayurveda” and “Ayurvedic Research Database” (ARD) are important grey resources for hand searching. About 44/105 (41.5%) journals publishing Ayurvedic studies are not indexed in any database. Only 11/105 (10.4%) exclusive Ayurveda journals are indexed in PubMed. Conclusion: AYUSH research portal and DHARA are two major portals after 2010. It is mandatory to search PubMed and four other databases because all five carry citations from different groups of journals. The hand searching is important to identify Ayurveda publications that are not indexed elsewhere. Availability information of citations in Ayurveda libraries from National Union Catalogue of Scientific Serials in India if regularly updated will improve the efficacy of hand searching. A grey database (ARD) contains unpublished PG/Ph.D. theses. The AYUSH portal, DHARA (funded by Ministry of AYUSH), and ARD should be merged to form single larger database to limit Ayurveda literature searches. PMID:27313409
Are Bibliographic Management Software Search Interfaces Reliable?: A Comparison between Search Results Obtained Using Database Interfaces and the EndNote Online Search Function

ERIC Educational Resources Information Center

Fitzgibbons, Megan; Meert, Deborah

2010-01-01

The use of bibliographic management software and its internal search interfaces is now pervasive among researchers. This study compares the results between searches conducted in academic databases' search interfaces versus the EndNote search interface. The results show mixed search reliability, depending on the database and type of search…
Histoplasma capsulatum proteome response to decreased iron availability

PubMed Central

Winters, Michael S; Spellman, Daniel S; Chan, Qilin; Gomez, Francisco J; Hernandez, Margarita; Catron, Brittany; Smulian, Alan G; Neubert, Thomas A; Deepe, George S

2008-01-01

Background A fundamental pathogenic feature of the fungus Histoplasma capsulatum is its ability to evade innate and adaptive immune defenses. Once ingested by macrophages the organism is faced with several hostile environmental conditions including iron limitation. H. capsulatum can establish a persistent state within the macrophage. A gap in knowledge exists because the identities and number of proteins regulated by the organism under host conditions has yet to be defined. Lack of such knowledge is an important problem because until these proteins are identified it is unlikely that they can be targeted as new and innovative treatment for histoplasmosis. Results To investigate the proteomic response by H. capsulatum to decreasing iron availability we have created H. capsulatum protein/genomic databases compatible with current mass spectrometric (MS) search engines. Databases were assembled from the H. capsulatum G217B strain genome using gene prediction programs and expressed sequence tag (EST) libraries. Searching these databases with MS data generated from two dimensional (2D) in-gel digestions of proteins resulted in over 50% more proteins identified compared to searching the publicly available fungal databases alone. Using 2D gel electrophoresis combined with statistical analysis we discovered 42 H. capsulatum proteins whose abundance was significantly modulated when iron concentrations were lowered. Altered proteins were identified by mass spectrometry and database searching to be involved in glycolysis, the tricarboxylic acid cycle, lysine metabolism, protein synthesis, and one protein sequence whose function was unknown. Conclusion We have created a bioinformatics platform for H. capsulatum and demonstrated the utility of a proteomic approach by identifying a shift in metabolism the organism utilizes to cope with the hostile conditions provided by the host. We have shown that enzyme transcripts regulated by other fungal pathogens in response to lowering iron availability are also regulated in H. capsulatum at the protein level. We also identified H. capsulatum proteins sensitive to iron level reductions which have yet to be connected to iron availability in other pathogens. These data also indicate the complexity of the response by H. capsulatum to nutritional deprivation. Finally, we demonstrate the importance of a strain specific gene/protein database for H. capsulatum proteomic analysis. PMID:19108728

Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis.

PubMed

Segura Bedmar, Isabel; Martínez, Paloma; Carruana Martín, Adrián

2017-12-01

Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature. The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles from MEDLINE. Our approach relies on the assumption that similar documents should be classified by similar MeSH terms. Although previous work has already exploited the document similarity by using a k-nearest neighbors algorithm, we represent documents as document vectors by search engine indexing and then compute the similarity between documents using cosine similarity. Once the most similar documents for a given input document are retrieved, we rank their MeSH terms to choose the most suitable set for the input document. To do this, we define a scoring function that takes into account the frequency of the term into the set of retrieved documents and the similarity between the input document and each retrieved document. In addition, we implement guidelines proposed by human curators to annotate MEDLINE articles; in particular, the heuristic that says if 3 MeSH terms are proposed to classify an article and they share the same ancestor, they should be replaced by this ancestor. The representation of the MeSH thesaurus as a graph database allows us to employ graph search algorithms to quickly and easily capture hierarchical relationships such as the lowest common ancestor between terms. Our experiments show promising results with an F1 of 69% on the test dataset. To the best of our knowledge, this is the first work that combines search and graph database technologies for the task of biomedical semantic indexing. Due to its horizontal scalability, ElasticSearch becomes a real solution to index large collections of documents (such as the bibliographic database MEDLINE). Moreover, the use of graph search algorithms for accessing MeSH information could provide a support tool for cataloging MEDLINE abstracts in real time. ©Isabel Segura Bedmar, Paloma Martínez, Adrián Carruana Martín. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 01.12.2017.
Knowledge, skills and attitudes of hospital pharmacists in the use of information technology and electronic tools to support clinical practice: A Brazilian survey

PubMed Central

Vasconcelos, Hemerson Bruno da Silva; Woods, David John

2017-01-01

This study aimed to identify the knowledge, skills and attitudes of Brazilian hospital pharmacists in the use of information technology and electronic tools to support clinical practice. Methods: A questionnaire was sent by email to clinical pharmacists working public and private hospitals in Brazil. The instrument was validated using the method of Polit and Beck to determine the content validity index. Data (n = 348) were analyzed using descriptive statistics, Pearson's Chi-square test and Gamma correlation tests. Results: Pharmacists had 1–4 electronic devices for personal use, mainly smartphones (84.8%; n = 295) and laptops (81.6%; n = 284). At work, pharmacists had access to a computer (89.4%; n = 311), mostly connected to the internet (83.9%; n = 292). They felt competent (very capable/capable) searching for a web page/web site on a specific subject (100%; n = 348), downloading files (99.7%; n = 347), using spreadsheets (90.2%; n = 314), searching using MeSH terms in PubMed (97.4%; n = 339) and general searching for articles in bibliographic databases (such as Medline/PubMed: 93.4%; n = 325). Pharmacists did not feel competent in using statistical analysis software (somewhat capable/incapable: 78.4%; n = 273). Most pharmacists reported that they had not received formal education to perform most of these actions except searching using MeSH terms. Access to bibliographic databases was available in Brazilian hospitals, however, most pharmacists (78.7%; n = 274) reported daily use of a non-specific search engine such as Google. This result may reflect the lack of formal knowledge and training in the use of bibliographic databases and difficulty with the English language. The need to expand knowledge about information search tools was recognized by most pharmacists in clinical practice in Brazil, especially those with less time dedicated exclusively to clinical activity (Chi-square, p = 0.006). Conclusion: These results will assist in defining minimal competencies for the training of pharmacists in the field of information technology to support clinical practice. Knowledge and skill gaps are evident in the use of bibliographic databases, spreadsheets and statistical tools. PMID:29272292
Knowledge, skills and attitudes of hospital pharmacists in the use of information technology and electronic tools to support clinical practice: A Brazilian survey.

PubMed

Néri, Eugenie Desirèe Rabelo; Meira, Assuero Silva; Vasconcelos, Hemerson Bruno da Silva; Woods, David John; Fonteles, Marta Maria de França

2017-01-01

This study aimed to identify the knowledge, skills and attitudes of Brazilian hospital pharmacists in the use of information technology and electronic tools to support clinical practice. A questionnaire was sent by email to clinical pharmacists working public and private hospitals in Brazil. The instrument was validated using the method of Polit and Beck to determine the content validity index. Data (n = 348) were analyzed using descriptive statistics, Pearson's Chi-square test and Gamma correlation tests. Pharmacists had 1-4 electronic devices for personal use, mainly smartphones (84.8%; n = 295) and laptops (81.6%; n = 284). At work, pharmacists had access to a computer (89.4%; n = 311), mostly connected to the internet (83.9%; n = 292). They felt competent (very capable/capable) searching for a web page/web site on a specific subject (100%; n = 348), downloading files (99.7%; n = 347), using spreadsheets (90.2%; n = 314), searching using MeSH terms in PubMed (97.4%; n = 339) and general searching for articles in bibliographic databases (such as Medline/PubMed: 93.4%; n = 325). Pharmacists did not feel competent in using statistical analysis software (somewhat capable/incapable: 78.4%; n = 273). Most pharmacists reported that they had not received formal education to perform most of these actions except searching using MeSH terms. Access to bibliographic databases was available in Brazilian hospitals, however, most pharmacists (78.7%; n = 274) reported daily use of a non-specific search engine such as Google. This result may reflect the lack of formal knowledge and training in the use of bibliographic databases and difficulty with the English language. The need to expand knowledge about information search tools was recognized by most pharmacists in clinical practice in Brazil, especially those with less time dedicated exclusively to clinical activity (Chi-square, p = 0.006). These results will assist in defining minimal competencies for the training of pharmacists in the field of information technology to support clinical practice. Knowledge and skill gaps are evident in the use of bibliographic databases, spreadsheets and statistical tools.
Digital dream analysis: a revised method.

PubMed

Bulkeley, Kelly

2014-10-01

This article demonstrates the use of a digital word search method designed to provide greater accuracy, objectivity, and speed in the study of dreams. A revised template of 40 word search categories, built into the website of the Sleep and Dream Database (SDDb), is applied to four "classic" sets of dreams: The male and female "Norm" dreams of Hall and Van de Castle (1966), the "Engine Man" dreams discussed by Hobson (1988), and the "Barb Sanders Baseline 250" dreams examined by Domhoff (2003). A word search analysis of these original dream reports shows that a digital approach can accurately identify many of the same distinctive patterns of content found by previous investigators using much more laborious and time-consuming methods. The results of this study emphasize the compatibility of word search technologies with traditional approaches to dream content analysis. Copyright © 2014 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Wojick, D E; Warnick, W L; Carroll, B C

With the United States federal government spending billions annually for research and development, ways to increase the productivity of that research can have a significant return on investment. The process by which science knowledge is spread is called diffusion. It is therefore important to better understand and measure the benefits of this diffusion of knowledge. In particular, it is important to understand whether advances in Internet searching can speed up the diffusion of scientific knowledge and accelerate scientific progress despite the fact that the vast majority of scientific information resources continue to be held in deep web databases that manymore » search engines cannot fully access. To address the complexity of the search issue, the term global discovery is used for the act of searching across heterogeneous environments and distant communities. This article discusses these issues and describes research being conducted by the Office of Scientific and Technical Information (OSTI).« less
Database Resources of the BIG Data Center in 2018.

PubMed

2018-01-04

The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Toward Computational Cumulative Biology by Combining Models of Biological Datasets

PubMed Central

Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

2014-01-01

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database. PMID:25427176
Toward computational cumulative biology by combining models of biological datasets.

PubMed

Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

2014-01-01

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.
A survey of the current status of web-based databases indexing Iranian journals.

PubMed

Merat, Shahin; Khatibzadeh, Shahab; Mesgarpour, Bita; Malekzadeh, Reza

2009-05-01

The scientific output of Iran is increasing rapidly during the recent years. Unfortunately, most papers are published in journals which are not indexed by popular indexing systems and many of them are in Persian without English translation. This makes the results of Iranian scientific research unavailable to other researchers, including Iranians. The aim of this study was to evaluate the quality of current web-based databases indexing scientific articles published in Iran. We identified web-based databases which indexed scientific journals published in Iran using popular search engines. The sites were then subjected to a series of tests to evaluate their coverage, search capabilities, stability, accuracy of information, consistency, accessibility, ease of use, and other features. Results were compared with each other to identify strengths and shortcomings of each site. Five web sites were indentified. None had a complete coverage on scientific Iranian journals. The search capabilities were less than optimal in most sites. English translations of research titles, author names, keywords, and abstracts of Persian-language articles did not follow standards. Some sites did not cover abstracts. Numerous typing errors make searches ineffective and citation indexing unreliable. None of the currently available indexing sites are capable of presenting Iranian research to the international scientific community. The government should intervene by enforcing policies designed to facilitate indexing through a systematic approach. The policies should address Iranian journals, authors, and indexing sites. Iranian journals should be required to provide their indexing data, including references, electronically; authors should provide correct indexing information to journals; and indexing sites should improve their software to meet standards set by the government.
Analysis of human serum phosphopeptidome by a focused database searching strategy.

PubMed

Zhu, Jun; Wang, Fangjun; Cheng, Kai; Song, Chunxia; Qin, Hongqiang; Hu, Lianghai; Figeys, Daniel; Ye, Mingliang; Zou, Hanfa

2013-01-14

As human serum is an important source for early diagnosis of many serious diseases, analysis of serum proteome and peptidome has been extensively performed. However, the serum phosphopeptidome was less explored probably because the effective method for database searching is lacking. Conventional database searching strategy always uses the whole proteome database, which is very time-consuming for phosphopeptidome search due to the huge searching space resulted from the high redundancy of the database and the setting of dynamic modifications during searching. In this work, a focused database searching strategy using an in-house collected human serum pro-peptidome target/decoy database (HuSPep) was established. It was found that the searching time was significantly decreased without compromising the identification sensitivity. By combining size-selective Ti (IV)-MCM-41 enrichment, RP-RP off-line separation, and complementary CID and ETD fragmentation with the new searching strategy, 143 unique endogenous phosphopeptides and 133 phosphorylation sites (109 novel sites) were identified from human serum with high reliability. Copyright © 2012 Elsevier B.V. All rights reserved.
Agency, Social and Healthcare Supports for Adults with Intellectual Disability at the End of Life in Out-of-Home, Non-Institutional Community Residences in Western Nations: A Literature Review

ERIC Educational Resources Information Center

Moro, Teresa T.; Savage, Teresa A.; Gehlert, Sarah

2017-01-01

Background: The nature and quality of end-of-life care received by adults with intellectual disabilities in out-of-home, non-institutional community agency residences in Western nations is not well understood. Method: A range of databases and search engines were used to locate conceptual, clinical and research articles from relevant peer-reviewed…
Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification.

PubMed

Li, Honglan; Joh, Yoon Sung; Kim, Hyunwoo; Paek, Eunok; Lee, Sang-Won; Hwang, Kyu-Baek

2016-12-22

Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available. To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches. We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search.
HerDing: herb recommendation system to treat diseases using genes and chemicals

PubMed Central

Choi, Wonjun; Choi, Chan-Hun; Kim, Young Ran; Kim, Seon-Jong; Na, Chang-Su; Lee, Hyunju

2016-01-01

In recent years, herbs have been researched for new drug candidates because they have a long empirical history of treating diseases and are relatively free from side effects. Studies to scientifically prove the medical efficacy of herbs for target diseases often spend a considerable amount of time and effort in choosing candidate herbs and in performing experiments to measure changes of marker genes when treating herbs. A computational approach to recommend herbs for treating diseases might be helpful to promote efficiency in the early stage of such studies. Although several databases related to traditional Chinese medicine have been already developed, there is no specialized Web tool yet recommending herbs to treat diseases based on disease-related genes. Therefore, we developed a novel search engine, HerDing, focused on retrieving candidate herb-related information with user search terms (a list of genes, a disease name, a chemical name or an herb name). HerDing was built by integrating public databases and by applying a text-mining method. The HerDing website is free and open to all users, and there is no login requirement. Database URL: http://combio.gist.ac.kr/herding PMID:26980517
HerDing: herb recommendation system to treat diseases using genes and chemicals.

PubMed

Choi, Wonjun; Choi, Chan-Hun; Kim, Young Ran; Kim, Seon-Jong; Na, Chang-Su; Lee, Hyunju

2016-01-01

In recent years, herbs have been researched for new drug candidates because they have a long empirical history of treating diseases and are relatively free from side effects. Studies to scientifically prove the medical efficacy of herbs for target diseases often spend a considerable amount of time and effort in choosing candidate herbs and in performing experiments to measure changes of marker genes when treating herbs. A computational approach to recommend herbs for treating diseases might be helpful to promote efficiency in the early stage of such studies. Although several databases related to traditional Chinese medicine have been already developed, there is no specialized Web tool yet recommending herbs to treat diseases based on disease-related genes. Therefore, we developed a novel search engine, HerDing, focused on retrieving candidate herb-related information with user search terms (a list of genes, a disease name, a chemical name or an herb name). HerDing was built by integrating public databases and by applying a text-mining method. The HerDing website is free and open to all users, and there is no login requirement. Database URL: http://combio.gist.ac.kr/herding. © The Author(s) 2016. Published by Oxford University Press.
Specialized medical search-engines are no better than general search-engines in sourcing consumer information about androgen deficiency.

PubMed

Ilic, D; Bessell, T L; Silagy, C A; Green, S

2003-03-01

The Internet provides consumers with access to online health information; however, identifying relevant and valid information can be problematic. Our objectives were firstly to investigate the efficiency of search-engines, and then to assess the quality of online information pertaining to androgen deficiency in the ageing male (ADAM). Keyword searches were performed on nine search-engines (four general and five medical) to identify website information regarding ADAM. Search-engine efficiency was compared by percentage of relevant websites obtained via each search-engine. The quality of information published on each website was assessed using the DISCERN rating tool. Of 4927 websites searched, 47 (1.44%) and 10 (0.60%) relevant websites were identified by general and medical search-engines respectively. The overall quality of online information on ADAM was poor. The quality of websites retrieved using medical search-engines did not differ significantly from those retrieved by general search-engines. Despite the poor quality of online information relating to ADAM, it is evident that medical search-engines are no better than general search-engines in sourcing consumer information relevant to ADAM.
A Large-Sample Test of a Semi-Automated Clavicle Search Engine to Assist Skeletal Identification by Radiograph Comparison.

PubMed

D'Alonzo, Susan S; Guyomarc'h, Pierre; Byrd, John E; Stephan, Carl N

2017-01-01

In 2014, a morphometric capability to search chest radiograph databases by quantified clavicle shape was published to assist skeletal identification. Here, we extend the validation tests conducted by increasing the search universe 18-fold, from 409 to 7361 individuals to determine whether there is any associated decrease in performance under these more challenging circumstances. The number of trials and analysts were also increased, respectively, from 17 to 30 skeletons, and two to four examiners. Elliptical Fourier analysis was conducted on clavicles from each skeleton by each analyst (shadowgrams trimmed from scratch in every instance) and compared to the search universe. Correctly matching individuals were found in shortlists of 10% of the sample 70% of the time. This rate is similar to, although slightly lower than, rates previously found for much smaller samples (80%). Accuracy and reliability are thereby maintained, even when the comparison system is challenged by much larger search universes. © 2016 American Academy of Forensic Sciences.
TokSearch: A search engine for fusion experimental data

DOE PAGES

Sammuli, Brian S.; Barr, Jayson L.; Eidietis, Nicholas W.; ...

2018-04-01

At a typical fusion research site, experimental data is stored using archive technologies that deal with each discharge as an independent set of data. These technologies (e.g. MDSplus or HDF5) are typically supplemented with a database that aggregates metadata for multiple shots to allow for efficient querying of certain predefined quantities. Often, however, a researcher will need to extract information from the archives, possibly for many shots, that is not available in the metadata store or otherwise indexed for quick retrieval. To address this need, a new search tool called TokSearch has been added to the General Atomics TokSys controlmore » design and analysis suite [1]. This tool provides the ability to rapidly perform arbitrary, parallelized queries of archived tokamak shot data (both raw and analyzed) over large numbers of shots. The TokSearch query API borrows concepts from SQL, and users can choose to implement queries in either MatlabTM or Python.« less
TokSearch: A search engine for fusion experimental data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sammuli, Brian S.; Barr, Jayson L.; Eidietis, Nicholas W.

At a typical fusion research site, experimental data is stored using archive technologies that deal with each discharge as an independent set of data. These technologies (e.g. MDSplus or HDF5) are typically supplemented with a database that aggregates metadata for multiple shots to allow for efficient querying of certain predefined quantities. Often, however, a researcher will need to extract information from the archives, possibly for many shots, that is not available in the metadata store or otherwise indexed for quick retrieval. To address this need, a new search tool called TokSearch has been added to the General Atomics TokSys controlmore » design and analysis suite [1]. This tool provides the ability to rapidly perform arbitrary, parallelized queries of archived tokamak shot data (both raw and analyzed) over large numbers of shots. The TokSearch query API borrows concepts from SQL, and users can choose to implement queries in either MatlabTM or Python.« less
DECADE Web Portal: Integrating MaGa, EarthChem and GVP Will Further Our Knowledge on Earth Degassing

NASA Astrophysics Data System (ADS)

Cardellini, C.; Frigeri, A.; Lehnert, K. A.; Ash, J.; McCormick, B.; Chiodini, G.; Fischer, T. P.; Cottrell, E.

2014-12-01

The release of gases from the Earth's interior to the exosphere takes place in both volcanic and non-volcanic areas of the planet. Fully understanding this complex process requires the integration of geochemical, petrological and volcanological data. At present, major online data repositories relevant to studies of degassing are not linked and interoperable. We are developing interoperability between three of those, which will support more powerful synoptic studies of degassing. The three data systems that will make their data accessible via the DECADE portal are: (1) the Smithsonian Institution's Global Volcanism Program database (GVP) of volcanic activity data, (2) EarthChem databases for geochemical and geochronological data of rocks and melt inclusions, and (3) the MaGa database (Mapping Gas emissions) which contains compositional and flux data of gases released at volcanic and non-volcanic degassing sites. These databases are developed and maintained by institutions or groups of experts in a specific field, and data are archived in formats specific to these databases. In the framework of the Deep Earth Carbon Degassing (DECADE) initiative of the Deep Carbon Observatory (DCO), we are developing a web portal that will create a powerful search engine of these databases from a single entry point. The portal will return comprehensive multi-component datasets, based on the search criteria selected by the user. For example, a single geographic or temporal search will return data relating to compositions of emitted gases and erupted products, the age of the erupted products, and coincident activity at the volcano. The development of this level of capability for the DECADE Portal requires complete synergy between these databases, including availability of standard-based web services (WMS, WFS) at all data systems. Data and metadata can thus be extracted from each system without interfering with each database's local schema or being replicated to achieve integration at the DECADE web portal. The DECADE portal will enable new synoptic perspectives on the Earth degassing process. Other data systems can be easily plugged in using the existing framework. Our vision is to explore Earth degassing related datasets over previously unexplored spatial or temporal ranges.
Devices for Self-Monitoring Sedentary Time or Physical Activity: A Scoping Review.

PubMed

Sanders, James P; Loveday, Adam; Pearson, Natalie; Edwardson, Charlotte; Yates, Thomas; Biddle, Stuart J H; Esliger, Dale W

2016-05-04

It is well documented that meeting the guideline levels (150 minutes per week) of moderate-to-vigorous physical activity (PA) is protective against chronic disease. Conversely, emerging evidence indicates the deleterious effects of prolonged sitting. Therefore, there is a need to change both behaviors. Self-monitoring of behavior is one of the most robust behavior-change techniques available. The growing number of technologies in the consumer electronics sector provides a unique opportunity for individuals to self-monitor their behavior. The aim of this study is to review the characteristics and measurement properties of currently available self-monitoring devices for sedentary time and/or PA. To identify technologies, four scientific databases were systematically searched using key terms related to behavior, measurement, and population. Articles published through October 2015 were identified. To identify technologies from the consumer electronic sector, systematic searches of three Internet search engines were also performed through to October 1, 2015. The initial database searches identified 46 devices and the Internet search engines identified 100 devices yielding a total of 146 technologies. Of these, 64 were further removed because they were currently unavailable for purchase or there was no evidence that they were designed for, had been used in, or could readily be modified for self-monitoring purposes. The remaining 82 technologies were included in this review (73 devices self-monitored PA, 9 devices self-monitored sedentary time). Of the 82 devices included, this review identified no published articles in which these devices were used for the purpose of self-monitoring PA and/or sedentary behavior; however, a number of technologies were found via Internet searches that matched the criteria for self-monitoring and provided immediate feedback on PA (ActiGraph Link, Microsoft Band, and Garmin Vivofit) and sedentary time (activPAL VT, the Lumo Back, and Darma). There are a large number of devices that self-monitor PA; however, there is a greater need for the development of tools to self-monitor sedentary time. The novelty of these devices means they have yet to be used in behavior change interventions, although the growing field of wearable technology may facilitate this to change.

Devices for Self-Monitoring Sedentary Time or Physical Activity: A Scoping Review

PubMed Central

Loveday, Adam; Pearson, Natalie; Edwardson, Charlotte; Yates, Thomas; Biddle, Stuart JH; Esliger, Dale W

2016-01-01

Background It is well documented that meeting the guideline levels (150 minutes per week) of moderate-to-vigorous physical activity (PA) is protective against chronic disease. Conversely, emerging evidence indicates the deleterious effects of prolonged sitting. Therefore, there is a need to change both behaviors. Self-monitoring of behavior is one of the most robust behavior-change techniques available. The growing number of technologies in the consumer electronics sector provides a unique opportunity for individuals to self-monitor their behavior. Objective The aim of this study is to review the characteristics and measurement properties of currently available self-monitoring devices for sedentary time and/or PA. Methods To identify technologies, four scientific databases were systematically searched using key terms related to behavior, measurement, and population. Articles published through October 2015 were identified. To identify technologies from the consumer electronic sector, systematic searches of three Internet search engines were also performed through to October 1, 2015. Results The initial database searches identified 46 devices and the Internet search engines identified 100 devices yielding a total of 146 technologies. Of these, 64 were further removed because they were currently unavailable for purchase or there was no evidence that they were designed for, had been used in, or could readily be modified for self-monitoring purposes. The remaining 82 technologies were included in this review (73 devices self-monitored PA, 9 devices self-monitored sedentary time). Of the 82 devices included, this review identified no published articles in which these devices were used for the purpose of self-monitoring PA and/or sedentary behavior; however, a number of technologies were found via Internet searches that matched the criteria for self-monitoring and provided immediate feedback on PA (ActiGraph Link, Microsoft Band, and Garmin Vivofit) and sedentary time (activPAL VT, the Lumo Back, and Darma). Conclusions There are a large number of devices that self-monitor PA; however, there is a greater need for the development of tools to self-monitor sedentary time. The novelty of these devices means they have yet to be used in behavior change interventions, although the growing field of wearable technology may facilitate this to change. PMID:27145905
Mitigation of Fluorosis - A Review

PubMed Central

Dodamani, Arun S.; Jadhav, Harish C.; Naik, Rahul G.; Deshmukh, Manjiri A.

2015-01-01

Fluoride is required for normal development and growth of the body. It is found in plentiful quantity in environment and fluoride content in drinking water is largest contributor to the daily fluoride intake. The behaviour of fluoride ions in the human organism can be regarded as that of “double-edged sword”. Fluoride is beneficial in small amounts but toxic in large amounts. Excessive consumption of fluorides in various forms leads to development of fluorosis. Fluorosis is major health problem in 24 countries, including India, which lies in the geographical fluoride belt. Various technologies are being used to remove fluoride from water but still the problem has not been rooted out. The purpose of this paper is to review the available treatment modalities for fluorosis, available technologies for fluoride removal from water and ongoing fluorosis mitigation programs based on literature survey. Medline was the primary database used in the literature search. Other databases included: PubMed, Web of Science, Google Scholar, WHO, Ebscohost, Science Direct, Google Search Engine, etc. PMID:26266235
Making the MagIC (Magnetics Information Consortium) Web Application Accessible to New Users and Useful to Experts

NASA Astrophysics Data System (ADS)

Minnett, R.; Koppers, A.; Jarboe, N.; Tauxe, L.; Constable, C.; Jonestrask, L.

2017-12-01

Challenges are faced by both new and experienced users interested in contributing their data to community repositories, in data discovery, or engaged in potentially transformative science. The Magnetics Information Consortium (https://earthref.org/MagIC) has recently simplified its data model and developed a new containerized web application to reduce the friction in contributing, exploring, and combining valuable and complex datasets for the paleo-, geo-, and rock magnetic scientific community. The new data model more closely reflects the hierarchical workflow in paleomagnetic experiments to enable adequate annotation of scientific results and ensure reproducibility. The new open-source (https://github.com/earthref/MagIC) application includes an upload tool that is integrated with the data model to provide early data validation feedback and ease the friction of contributing and updating datasets. The search interface provides a powerful full text search of contributions indexed by ElasticSearch and a wide array of filters, including specific geographic and geological timescale filtering, to support both novice users exploring the database and experts interested in compiling new datasets with specific criteria across thousands of studies and millions of measurements. The datasets are not large, but they are complex, with many results from evolving experimental and analytical approaches. These data are also extremely valuable due to the cost in collecting or creating physical samples and the, often, destructive nature of the experiments. MagIC is heavily invested in encouraging young scientists as well as established labs to cultivate workflows that facilitate contributing their data in a consistent format. This eLightning presentation includes a live demonstration of the MagIC web application, developed as a configurable container hosting an isomorphic Meteor JavaScript application, MongoDB database, and ElasticSearch search engine. Visitors can explore the MagIC Database through maps and image or plot galleries or search and filter the raw measurements and their derived hierarchy of analytical interpretations.
A search engine to access PubMed monolingual subsets: proof of concept and evaluation in French.

PubMed

Griffon, Nicolas; Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques

2014-12-01

PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese.
A Search Engine to Access PubMed Monolingual Subsets: Proof of Concept and Evaluation in French

PubMed Central

Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques

2014-01-01

Background PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. Objective The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). Methods To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. Results More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. Conclusions It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese. PMID:25448528
Do Seasons Have an Influence on the Incidence of Depression? The Use of an Internet Search Engine Query Data as a Proxy of Human Affect

PubMed Central

Yang, Albert C.; Huang, Norden E.; Peng, Chung-Kang; Tsai, Shih-Jen

2010-01-01

Background Seasonal depression has generated considerable clinical interest in recent years. Despite a common belief that people in higher latitudes are more vulnerable to low mood during the winter, it has never been demonstrated that human's moods are subject to seasonal change on a global scale. The aim of this study was to investigate large-scale seasonal patterns of depression using Internet search query data as a signature and proxy of human affect. Methodology/Principal Findings Our study was based on a publicly available search engine database, Google Insights for Search, which provides time series data of weekly search trends from January 1, 2004 to June 30, 2009. We applied an empirical mode decomposition method to isolate seasonal components of health-related search trends of depression in 54 geographic areas worldwide. We identified a seasonal trend of depression that was opposite between the northern and southern hemispheres; this trend was significantly correlated with seasonal oscillations of temperature (USA: r = −0.872, p<0.001; Australia: r = −0.656, p<0.001). Based on analyses of search trends over 54 geological locations worldwide, we found that the degree of correlation between searching for depression and temperature was latitude-dependent (northern hemisphere: r = −0.686; p<0.001; southern hemisphere: r = 0.871; p<0.0001). Conclusions/Significance Our findings indicate that Internet searches for depression from people in higher latitudes are more vulnerable to seasonal change, whereas this phenomenon is obscured in tropical areas. This phenomenon exists universally across countries, regardless of language. This study provides novel, Internet-based evidence for the epidemiology of seasonal depression. PMID:21060851
Using SQL Databases for Sequence Similarity Searching and Analysis.

PubMed

Pearson, William R; Mackey, Aaron J

2017-09-13

Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Native Health Research Database

MedlinePlus

... Indian Health Board) Welcome to the Native Health Database. Please enter your search terms. Basic Search Advanced ... To learn more about searching the Native Health Database, click here. Tutorial Video The NHD has made ...
Rule-based deduplication of article records from bibliographic databases.

PubMed

Jiang, Yu; Lin, Can; Meng, Weiyi; Yu, Clement; Cohen, Aaron M; Smalheiser, Neil R

2014-01-01

We recently designed and deployed a metasearch engine, Metta, that sends queries and retrieves search results from five leading biomedical databases: PubMed, EMBASE, CINAHL, PsycINFO and the Cochrane Central Register of Controlled Trials. Because many articles are indexed in more than one of these databases, it is desirable to deduplicate the retrieved article records. This is not a trivial problem because data fields contain a lot of missing and erroneous entries, and because certain types of information are recorded differently (and inconsistently) in the different databases. The present report describes our rule-based method for deduplicating article records across databases and includes an open-source script module that can be deployed freely. Metta was designed to satisfy the particular needs of people who are writing systematic reviews in evidence-based medicine. These users want the highest possible recall in retrieval, so it is important to err on the side of not deduplicating any records that refer to distinct articles, and it is important to perform deduplication online in real time. Our deduplication module is designed with these constraints in mind. Articles that share the same publication year are compared sequentially on parameters including PubMed ID number, digital object identifier, journal name, article title and author list, using text approximation techniques. In a review of Metta searches carried out by public users, we found that the deduplication module was more effective at identifying duplicates than EndNote without making any erroneous assignments.
Rule-based deduplication of article records from bibliographic databases

PubMed Central

Jiang, Yu; Lin, Can; Meng, Weiyi; Yu, Clement; Cohen, Aaron M.; Smalheiser, Neil R.

2014-01-01

We recently designed and deployed a metasearch engine, Metta, that sends queries and retrieves search results from five leading biomedical databases: PubMed, EMBASE, CINAHL, PsycINFO and the Cochrane Central Register of Controlled Trials. Because many articles are indexed in more than one of these databases, it is desirable to deduplicate the retrieved article records. This is not a trivial problem because data fields contain a lot of missing and erroneous entries, and because certain types of information are recorded differently (and inconsistently) in the different databases. The present report describes our rule-based method for deduplicating article records across databases and includes an open-source script module that can be deployed freely. Metta was designed to satisfy the particular needs of people who are writing systematic reviews in evidence-based medicine. These users want the highest possible recall in retrieval, so it is important to err on the side of not deduplicating any records that refer to distinct articles, and it is important to perform deduplication online in real time. Our deduplication module is designed with these constraints in mind. Articles that share the same publication year are compared sequentially on parameters including PubMed ID number, digital object identifier, journal name, article title and author list, using text approximation techniques. In a review of Metta searches carried out by public users, we found that the deduplication module was more effective at identifying duplicates than EndNote without making any erroneous assignments. PMID:24434031
Start Your Engines: Surfing with Search Engines for Kids.

ERIC Educational Resources Information Center

Byerly, Greg; Brodie, Carolyn S.

1999-01-01

Suggests that to be an effective educator and user of the Web it is essential to know the basics about search engines. Presents tips for using search engines. Describes several search engines for children and young adults, as well as some general filtered search engines for children. (AEF)
AphidBase: A centralized bioinformatic resource for annotation of the pea aphid genome

PubMed Central

Legeai, Fabrice; Shigenobu, Shuji; Gauthier, Jean-Pierre; Colbourne, John; Rispe, Claude; Collin, Olivier; Richards, Stephen; Wilson, Alex C. C.; Tagu, Denis

2015-01-01

AphidBase is a centralized bioinformatic resource that was developed to facilitate community annotation of the pea aphid genome by the International Aphid Genomics Consortium (IAGC). The AphidBase Information System designed to organize and distribute genomic data and annotations for a large international community was constructed using open source software tools from the Generic Model Organism Database (GMOD). The system includes Apollo and GBrowse utilities as well as a wiki, blast search capabilities and a full text search engine. AphidBase strongly supported community cooperation and coordination in the curation of gene models during community annotation of the pea aphid genome. AphidBase can be accessed at http://www.aphidbase.com. PMID:20482635
Library Instruction and Online Database Searching.

ERIC Educational Resources Information Center

Mercado, Heidi

1999-01-01

Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…
Cryogenic Information Center

NASA Technical Reports Server (NTRS)

Mohling, Robert A.; Marquardt, Eric D.; Fusilier, Fred C.; Fesmire, James E.

2003-01-01

The Cryogenic Information Center (CIC) is a not-for-profit corporation dedicated to preserving and distributing cryogenic information to government, industry, and academia. The heart of the CIC is a uniform source of cryogenic data including analyses, design, materials and processes, and test information traceable back to the Cryogenic Data Center of the former National Bureau of Standards. The electronic database is a national treasure containing over 146,000 specific bibliographic citations of cryogenic literature and thermophysical property data dating back to 1829. A new technical/bibliographic inquiry service can perform searches and technical analyses. The Cryogenic Material Properties (CMP) Program consists of computer codes using empirical equations to determine thermophysical material properties with emphasis on the 4-300K range. CMP's objective is to develop a user-friendly standard material property database using the best available data so government and industry can conduct more accurate analyses. The CIC serves to benefit researchers, engineers, and technologists in cryogenics and cryogenic engineering, whether they are new or experienced in the field.
HOWDY: an integrated database system for human genome research

PubMed Central

Hirakawa, Mika

2002-01-01

HOWDY is an integrated database system for accessing and analyzing human genomic information (http://www-alis.tokyo.jst.go.jp/HOWDY/). HOWDY stores information about relationships between genetic objects and the data extracted from a number of databases. HOWDY consists of an Internet accessible user interface that allows thorough searching of the human genomic databases using the gene symbols and their aliases. It also permits flexible editing of the sequence data. The database can be searched using simple words and the search can be restricted to a specific cytogenetic location. Linear maps displaying markers and genes on contig sequences are available, from which an object can be chosen. Any search starting point identifies all the information matching the query. HOWDY provides a convenient search environment of human genomic data for scientists unsure which database is most appropriate for their search. PMID:11752279
Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data.

PubMed

Kumar, Dhirendra; Yadav, Amit Kumar; Dash, Debasis

2017-01-01

Database searching is the preferred method for protein identification from digital spectra of mass to charge ratios (m/z) detected for protein samples through mass spectrometers. The search database is one of the major influencing factors in discovering proteins present in the sample and thus in deriving biological conclusions. In most cases the choice of search database is arbitrary. Here we describe common search databases used in proteomic studies and their impact on final list of identified proteins. We also elaborate upon factors like composition and size of the search database that can influence the protein identification process. In conclusion, we suggest that choice of the database depends on the type of inferences to be derived from proteomics data. However, making additional efforts to build a compact and concise database for a targeted question should generally be rewarding in achieving confident protein identifications.
Towards computational improvement of DNA database indexing and short DNA query searching.

PubMed

Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

2014-09-03

In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.
footprintDB: a database of transcription factors with annotated cis elements and binding interfaces.

PubMed

Sebastian, Alvaro; Contreras-Moreira, Bruno

2014-01-15

Traditional and high-throughput techniques for determining transcription factor (TF) binding specificities are generating large volumes of data of uneven quality, which are scattered across individual databases. FootprintDB integrates some of the most comprehensive freely available libraries of curated DNA binding sites and systematically annotates the binding interfaces of the corresponding TFs. The first release contains 2422 unique TF sequences, 10 112 DNA binding sites and 3662 DNA motifs. A survey of the included data sources, organisms and TF families was performed together with proprietary database TRANSFAC, finding that footprintDB has a similar coverage of multicellular organisms, while also containing bacterial regulatory data. A search engine has been designed that drives the prediction of DNA motifs for input TFs, or conversely of TF sequences that might recognize input regulatory sequences, by comparison with database entries. Such predictions can also be extended to a single proteome chosen by the user, and results are ranked in terms of interface similarity. Benchmark experiments with bacterial, plant and human data were performed to measure the predictive power of footprintDB searches, which were able to correctly recover 10, 55 and 90% of the tested sequences, respectively. Correctly predicted TFs had a higher interface similarity than the average, confirming its diagnostic value. Web site implemented in PHP,Perl, MySQL and Apache. Freely available from http://floresta.eead.csic.es/footprintdb.
Rapid identification of anonymous subjects in large criminal databases: problems and solutions in IAFIS III/FBI subject searches

NASA Astrophysics Data System (ADS)

Kutzleb, C. D.

1997-02-01

The high incidence of recidivism (repeat offenders) in the criminal population makes the use of the IAFIS III/FBI criminal database an important tool in law enforcement. The problems and solutions employed by IAFIS III/FBI criminal subject searches are discussed for the following topics: (1) subject search selectivity and reliability; (2) the difficulty and limitations of identifying subjects whose anonymity may be a prime objective; (3) database size, search workload, and search response time; (4) techniques and advantages of normalizing the variability in an individual's name and identifying features into identifiable and discrete categories; and (5) the use of database demographics to estimate the likelihood of a match between a search subject and database subjects.
Adding a visualization feature to web search engines: it's time.

PubMed

Wong, Pak Chung

2008-01-01

It's widely recognized that all Web search engines today are almost identical in presentation layout and behavior. In fact, the same presentation approach has been applied to depicting search engine results pages (SERPs) since the first Web search engine launched in 1993. In this Visualization Viewpoints article, I propose to add a visualization feature to Web search engines and suggest that the new addition can improve search engines' performance and capabilities, which in turn lead to better Web search technology.

What is lost when searching only one literature database for articles relevant to injury prevention and safety promotion?

PubMed

Lawrence, D W

2008-12-01

To assess what is lost if only one literature database is searched for articles relevant to injury prevention and safety promotion (IPSP) topics. Serial textword (keyword, free-text) searches using multiple synonym terms for five key IPSP topics (bicycle-related brain injuries, ethanol-impaired driving, house fires, road rage, and suicidal behaviors among adolescents) were conducted in four of the bibliographic databases that are most used by IPSP professionals: EMBASE, MEDLINE, PsycINFO, and Web of Science. Through a systematic procedure, an inventory of articles on each topic in each database was conducted to identify the total unduplicated count of all articles on each topic, the number of articles unique to each database, and the articles available if only one database is searched. No single database included all of the relevant articles on any topic, and the database with the broadest coverage differed by topic. A search of only one literature database will return 16.7-81.5% (median 43.4%) of the available articles on any of five key IPSP topics. Each database contributed unique articles to the total bibliography for each topic. A literature search performed in only one database will, on average, lead to a loss of more than half of the available literature on a topic.
neXtA5: accelerating annotation of articles via automated approaches in neXtProt.

PubMed

Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

2016-01-01

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein-protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline.Available on: http://babar.unige.ch:8082/neXtA5Database URL: http://babar.unige.ch:8082/neXtA5/fetcher.jsp. © The Author(s) 2016. Published by Oxford University Press.
neXtA5: accelerating annotation of articles via automated approaches in neXtProt

PubMed Central

Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

2016-01-01

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein–protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline. Available on: http://babar.unige.ch:8082/neXtA5 Database URL: http://babar.unige.ch:8082/neXtA5/fetcher.jsp PMID:27374119
Meta Search Engines.

ERIC Educational Resources Information Center

Garman, Nancy

1999-01-01

Describes common options and features to consider in evaluating which meta search engine will best meet a searcher's needs. Discusses number and names of engines searched; other sources and specialty engines; search queries; other search options; and results options. (AEF)
Reducing Information Overload in Large Seismic Data Sets

DOE Office of Scientific and Technical Information (OSTI.GOV)

HAMPTON,JEFFERY W.; YOUNG,CHRISTOPHER J.; MERCHANT,BION J.

2000-08-02

Event catalogs for seismic data can become very large. Furthermore, as researchers collect multiple catalogs and reconcile them into a single catalog that is stored in a relational database, the reconciled set becomes even larger. The sheer number of these events makes searching for relevant events to compare with events of interest problematic. Information overload in this form can lead to the data sets being under-utilized and/or used incorrectly or inconsistently. Thus, efforts have been initiated to research techniques and strategies for helping researchers to make better use of large data sets. In this paper, the authors present their effortsmore » to do so in two ways: (1) the Event Search Engine, which is a waveform correlation tool and (2) some content analysis tools, which area combination of custom-built and commercial off-the-shelf tools for accessing, managing, and querying seismic data stored in a relational database. The current Event Search Engine is based on a hierarchical clustering tool known as the dendrogram tool, which is written as a MatSeis graphical user interface. The dendrogram tool allows the user to build dendrogram diagrams for a set of waveforms by controlling phase windowing, down-sampling, filtering, enveloping, and the clustering method (e.g. single linkage, complete linkage, flexible method). It also allows the clustering to be based on two or more stations simultaneously, which is important to bridge gaps in the sparsely recorded event sets anticipated in such a large reconciled event set. Current efforts are focusing on tools to help the researcher winnow the clusters defined using the dendrogram tool down to the minimum optimal identification set. This will become critical as the number of reference events in the reconciled event set continually grows. The dendrogram tool is part of the MatSeis analysis package, which is available on the Nuclear Explosion Monitoring Research and Engineering Program Web Site. As part of the research into how to winnow the reference events in these large reconciled event sets, additional database query approaches have been developed to provide windows into these datasets. These custom built content analysis tools help identify dataset characteristics that can potentially aid in providing a basis for comparing similar reference events in these large reconciled event sets. Once these characteristics can be identified, algorithms can be developed to create and add to the reduced set of events used by the Event Search Engine. These content analysis tools have already been useful in providing information on station coverage of the referenced events and basic statistical, information on events in the research datasets. The tools can also provide researchers with a quick way to find interesting and useful events within the research datasets. The tools could also be used as a means to review reference event datasets as part of a dataset delivery verification process. There has also been an effort to explore the usefulness of commercially available web-based software to help with this problem. The advantages of using off-the-shelf software applications, such as Oracle's WebDB, to manipulate, customize and manage research data are being investigated. These types of applications are being examined to provide access to large integrated data sets for regional seismic research in Asia. All of these software tools would provide the researcher with unprecedented power without having to learn the intricacies and complexities of relational database systems.« less
Helping Students Choose Tools To Search the Web.

ERIC Educational Resources Information Center

Cohen, Laura B.; Jacobson, Trudi E.

2000-01-01

Describes areas where faculty members can aid students in making intelligent use of the Web in their research. Differentiates between subject directories and search engines. Describes an engine's three components: spider, index, and search engine. Outlines two misconceptions: that Yahoo! is a search engine and that search engines contain all the…
Comparison Study of Overlap among 21 Scientific Databases in Searching Pesticide Information.

ERIC Educational Resources Information Center

Meyer, Daniel E.; And Others

1983-01-01

Evaluates overlapping coverage of 21 scientific databases used in 10 online pesticide searches in an attempt to identify minimum number of databases needed to generate 90 percent of unique, relevant citations for given search. Comparison of searches combined under given pesticide usage (herbicide, fungicide, insecticide) is discussed. Nine…
Grooker, KartOO, Addict-o-Matic and More: Really Different Search Engines

ERIC Educational Resources Information Center

Descy, Don E.

2009-01-01

There are hundreds of unique search engines in the United States and thousands of unique search engines around the world. If people get into search engines designed just to search particular web sites, the number is in the hundreds of thousands. This article looks at: (1) clustering search engines, such as KartOO (www.kartoo.com) and Grokker…
Content based information retrieval in forensic image databases.

PubMed

Geradts, Zeno; Bijhold, Jurrien

2002-03-01

This paper gives an overview of the various available image databases and ways of searching these databases on image contents. The developments in research groups of searching in image databases is evaluated and compared with the forensic databases that exist. Forensic image databases of fingerprints, faces, shoeprints, handwriting, cartridge cases, drugs tablets, and tool marks are described. The developments in these fields appear to be valuable for forensic databases, especially that of the framework in MPEG-7, where the searching in image databases is standardized. In the future, the combination of the databases (also DNA-databases) and possibilities to combine these can result in stronger forensic evidence.
Google Scholar is not enough to be used alone for systematic reviews.

PubMed

Giustini, Dean; Boulos, Maged N Kamel

2013-01-01

Google Scholar (GS) has been noted for its ability to search broadly for important references in the literature. Gehanno et al. recently examined GS in their study: 'Is Google scholar enough to be used alone for systematic reviews?' In this paper, we revisit this important question, and some of Gehanno et al.'s other findings in evaluating the academic search engine. The authors searched for a recent systematic review (SR) of comparable size to run search tests similar to those in Gehanno et al. We selected Chou et al. (2013) contacting the authors for a list of publications they found in their SR on social media in health. We queried GS for each of those 506 titles (in quotes "), one by one. When GS failed to retrieve a paper, or produced too many results, we used the allintitle: command to find papers with the same title. Google Scholar produced records for ~95% of the papers cited by Chou et al. (n=476/506). A few of the 30 papers that were not in GS were later retrieved via PubMed and even regular Google Search. But due to its different structure, we could not run searches in GS that were originally performed by Chou et al. in PubMed, Web of Science, Scopus and PsycINFO®. Identifying 506 papers in GS was an inefficient process, especially for papers using similar search terms. Has Google Scholar improved enough to be used alone in searching for systematic reviews? No. GS' constantly-changing content, algorithms and database structure make it a poor choice for systematic reviews. Looking for papers when you know their titles is a far different issue from discovering them initially. Further research is needed to determine when and how (and for what purposes) GS can be used alone. Google should provide details about GS' database coverage and improve its interface (e.g., with semantic search filters, stored searching, etc.). Perhaps then it will be an appropriate choice for systematic reviews.
Searching for religion and mental health studies required health, social science, and grey literature databases.

PubMed

Wright, Judy M; Cottrell, David J; Mir, Ghazala

2014-07-01

To determine the optimal databases to search for studies of faith-sensitive interventions for treating depression. We examined 23 health, social science, religious, and grey literature databases searched for an evidence synthesis. Databases were prioritized by yield of (1) search results, (2) potentially relevant references identified during screening, (3) included references contained in the synthesis, and (4) included references that were available in the database. We assessed the impact of databases beyond MEDLINE, EMBASE, and PsycINFO by their ability to supply studies identifying new themes and issues. We identified pragmatic workload factors that influence database selection. PsycINFO was the best performing database within all priority lists. ArabPsyNet, CINAHL, Dissertations and Theses, EMBASE, Global Health, Health Management Information Consortium, MEDLINE, PsycINFO, and Sociological Abstracts were essential for our searches to retrieve the included references. Citation tracking activities and the personal library of one of the research teams made significant contributions of unique, relevant references. Religion studies databases (Am Theo Lib Assoc, FRANCIS) did not provide unique, relevant references. Literature searches for reviews and evidence syntheses of religion and health studies should include social science, grey literature, non-Western databases, personal libraries, and citation tracking activities. Copyright © 2014 Elsevier Inc. All rights reserved.
MetaSEEk: a content-based metasearch engine for images

NASA Astrophysics Data System (ADS)

Beigi, Mandis; Benitez, Ana B.; Chang, Shih-Fu

1997-12-01

Search engines are the most powerful resources for finding information on the rapidly expanding World Wide Web (WWW). Finding the desired search engines and learning how to use them, however, can be very time consuming. The integration of such search tools enables the users to access information across the world in a transparent and efficient manner. These systems are called meta-search engines. The recent emergence of visual information retrieval (VIR) search engines on the web is leading to the same efficiency problem. This paper describes and evaluates MetaSEEk, a content-based meta-search engine used for finding images on the Web based on their visual information. MetaSEEk is designed to intelligently select and interface with multiple on-line image search engines by ranking their performance for different classes of user queries. User feedback is also integrated in the ranking refinement. We compare MetaSEEk with a base line version of meta-search engine, which does not use the past performance of the different search engines in recommending target search engines for future queries.
[Advanced online search techniques and dedicated search engines for physicians].

PubMed

Nahum, Yoav

2008-02-01

In recent years search engines have become an essential tool in the work of physicians. This article will review advanced search techniques from the world of information specialists, as well as some advanced search engine operators that may help physicians improve their online search capabilities, and maximize the yield of their searches. This article also reviews popular dedicated scientific and biomedical literature search engines.
Molecule database framework: a framework for creating database applications with chemical structure search capability

PubMed Central

2013-01-01

Background Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Results Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes: • Support for multi-component compounds (mixtures) • Import and export of SD-files • Optional security (authorization) For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. Conclusions By using a simple web application it was shown that Molecule Database Framework successfully abstracts chemical structure searches and SD-File import and export to simple method calls. The framework offers good search performance on a standard laptop without any database tuning. This is also due to the fact that chemical structure searches are paged and cached. Molecule Database Framework is available for download on the projects web page on bitbucket: https://bitbucket.org/kienerj/moleculedatabaseframework. PMID:24325762
Molecule database framework: a framework for creating database applications with chemical structure search capability.

PubMed

Kiener, Joos

2013-12-11

Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes:•Support for multi-component compounds (mixtures)•Import and export of SD-files•Optional security (authorization)For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures).Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. By using a simple web application it was shown that Molecule Database Framework successfully abstracts chemical structure searches and SD-File import and export to simple method calls. The framework offers good search performance on a standard laptop without any database tuning. This is also due to the fact that chemical structure searches are paged and cached. Molecule Database Framework is available for download on the projects web page on bitbucket: https://bitbucket.org/kienerj/moleculedatabaseframework.
Search Engines: Gateway to a New ``Panopticon''?

NASA Astrophysics Data System (ADS)

Kosta, Eleni; Kalloniatis, Christos; Mitrou, Lilian; Kavakli, Evangelia

Nowadays, Internet users are depending on various search engines in order to be able to find requested information on the Web. Although most users feel that they are and remain anonymous when they place their search queries, reality proves otherwise. The increasing importance of search engines for the location of the desired information on the Internet usually leads to considerable inroads into the privacy of users. The scope of this paper is to study the main privacy issues with regard to search engines, such as the anonymisation of search logs and their retention period, and to examine the applicability of the European data protection legislation to non-EU search engine providers. Ixquick, a privacy-friendly meta search engine will be presented as an alternative to privacy intrusive existing practices of search engines.
TRENDS: A flight test relational database user's guide and reference manual

NASA Technical Reports Server (NTRS)

Bondi, M. J.; Bjorkman, W. S.; Cross, J. L.

1994-01-01

This report is designed to be a user's guide and reference manual for users intending to access rotocraft test data via TRENDS, the relational database system which was developed as a tool for the aeronautical engineer with no programming background. This report has been written to assist novice and experienced TRENDS users. TRENDS is a complete system for retrieving, searching, and analyzing both numerical and narrative data, and for displaying time history and statistical data in graphical and numerical formats. This manual provides a 'guided tour' and a 'user's guide' for the new and intermediate-skilled users. Examples for the use of each menu item within TRENDS is provided in the Menu Reference section of the manual, including full coverage for TIMEHIST, one of the key tools. This manual is written around the XV-15 Tilt Rotor database, but does include an appendix on the UH-60 Blackhawk database. This user's guide and reference manual establishes a referrable source for the research community and augments NASA TM-101025, TRENDS: The Aeronautical Post-Test, Database Management System, Jan. 1990, written by the same authors.
Effectiveness of training intervention to improve medical student's information literacy skills.

PubMed

Abdekhoda, Mohammadhiwa; Dehnad, Afsaneh; Yousefi, Mahmood

2016-12-01

This study aimed to assess the efficiency of delivering a 4-month course of "effective literature search" among medical postgraduate students for improving information literacy skills. This was a cross-sectional study in which 90 postgraduate students were randomly selected and participated in 12 training sessions. Effective search strategies were presented and the students' attitude and competency concerning online search were measured by a pre- and post-questionnaires and skill tests. Data were analyzed by SPSS version 16 using t-test. There was a significant improvement (p=0.00), in student's attitude. The mean (standard deviation [SD]) was 2.9 (0.8) before intervention versus the mean (SD) 3.9 (0.7) after intervention. Students' familiarity with medical resources and databases improved significantly. The data showed a significant increase (p=0.03), in students' competency score concerning search strategy design and conducting a search. The mean (SD) was 2.04 (0.7) before intervention versus the mean (SD) 3.07 (0.8) after intervention. Also, students' ability in applying search and meta search engine improved significantly. This study clearly acknowledges that the training intervention provides considerable opportunity to improve medical student's information literacy skills.
CellAtlasSearch: a scalable search engine for single cells.

PubMed

Srivastava, Divyanshu; Iyer, Arvind; Kumar, Vibhor; Sengupta, Debarka

2018-05-21

Owing to the advent of high throughput single cell transcriptomics, past few years have seen exponential growth in production of gene expression data. Recently efforts have been made by various research groups to homogenize and store single cell expression from a large number of studies. The true value of this ever increasing data deluge can be unlocked by making it searchable. To this end, we propose CellAtlasSearch, a novel search architecture for high dimensional expression data, which is massively parallel as well as light-weight, thus infinitely scalable. In CellAtlasSearch, we use a Graphical Processing Unit (GPU) friendly version of Locality Sensitive Hashing (LSH) for unmatched speedup in data processing and query. Currently, CellAtlasSearch features over 300 000 reference expression profiles including both bulk and single-cell data. It enables the user query individual single cell transcriptomes and finds matching samples from the database along with necessary meta information. CellAtlasSearch aims to assist researchers and clinicians in characterizing unannotated single cells. It also facilitates noise free, low dimensional representation of single-cell expression profiles by projecting them on a wide variety of reference samples. The web-server is accessible at: http://www.cellatlassearch.com.
Brute-Force Approach for Mass Spectrometry-Based Variant Peptide Identification in Proteogenomics without Personalized Genomic Data

NASA Astrophysics Data System (ADS)

Ivanov, Mark V.; Lobas, Anna A.; Levitsky, Lev I.; Moshkovskii, Sergei A.; Gorshkov, Mikhail V.

2018-02-01

In a proteogenomic approach based on tandem mass spectrometry analysis of proteolytic peptide mixtures, customized exome or RNA-seq databases are employed for identifying protein sequence variants. However, the problem of variant peptide identification without personalized genomic data is important for a variety of applications. Following the recent proposal by Chick et al. (Nat. Biotechnol. 33, 743-749, 2015) on the feasibility of such variant peptide search, we evaluated two available approaches based on the previously suggested "open" search and the "brute-force" strategy. To improve the efficiency of these approaches, we propose an algorithm for exclusion of false variant identifications from the search results involving analysis of modifications mimicking single amino acid substitutions. Also, we propose a de novo based scoring scheme for assessment of identified point mutations. In the scheme, the search engine analyzes y-type fragment ions in MS/MS spectra to confirm the location of the mutation in the variant peptide sequence.

Strategies to explore functional genomics data sets in NCBI's GEO database.

PubMed

Wilhite, Stephen E; Barrett, Tanya

2012-01-01

The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.
US EPA record of decision review for landfills: Sanitary landfill (740-G), Savannah River Site

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1993-06-01

This report presents the results of a review of the US Environmental Protection Agency (EPA) Record of Decision System (RODS) database search conducted to identify Superfund landfill sites where a Record of Decision (ROD) has been prepared by EPA, the States or the US Army Corps of Engineers describing the selected remedy at the site. ROD abstracts from the database were reviewed to identify site information including site type, contaminants of concern, components of the selected remedy, and cleanup goals. Only RODs from landfill sites were evaluated so that the results of the analysis can be used to support themore » remedy selection process for the Sanitary Landfill at the Savannah River Site (SRS).« less
Strategies to Explore Functional Genomics Data Sets in NCBI’s GEO Database

PubMed Central

Wilhite, Stephen E.; Barrett, Tanya

2012-01-01

The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries. PMID:22130872
The Theory of Planned Behaviour Applied to Search Engines as a Learning Tool

ERIC Educational Resources Information Center

Liaw, Shu-Sheng

2004-01-01

Search engines have been developed for helping learners to seek online information. Based on theory of planned behaviour approach, this research intends to investigate the behaviour of using search engines as a learning tool. After factor analysis, the results suggest that perceived satisfaction of search engine, search engines as an information…
Evidence of the factors that influence the utilisation of Kangaroo Mother Care by parents with low-birth-weight infants in low- and middle-income countries (LMICs): a scoping review protocol.

PubMed

Mathias, Christina T; Mianda, Solange; Ginindza, Themba G

2018-04-05

The Sustainable Development Goal (SDG) 3 emphasises on reducing neonatal deaths caused by low birth weight (LBW) complications by the implementation and utilisation of Kangaroo Mother Care (KMC) in low- and middle-income countries (LMICs). Despite the empirical evidence of KMC optimising low-birth-weight infants' (LBWIs') survival, its advantages and the LMICs implementing the service, studies have shown that LBW infant deaths occurring in LMICs are largely contributing to global child mortality. The aim of this scoping review is to map out the literature on barriers, challenges and facilitators of KMC utilisation by parents with LBWIs. This scoping review will use Endnote X7 reference management software to manage articles. The review search strategy will use SCIELO and LILACS databases. Other databases will be used via EBSCOHost search engine and these are Academic search complete, CINAHL with full text, Education source, Health source: Nursing/Academic Edition, Medline with full text and Medline. We will also use Google Scholar, JSTOR, Open grey search engines and reference lists. A two-phase search mapping out process will be done. In phase 1, one reviewer will perform the title screening and removal of duplicates. Two reviewers will do a parallel abstract screening according to eligibility criteria. Phase 2 will involve the reading of full articles and exclusion of articles, in accordance with the eligibility criteria. Data extraction from the articles will be done by two reviewers independently and parallel to the data extraction form. The data quality assessment of the eligible studies will be done using the Mixed Method Appraisal Tool (MMAT). The extraction of the synthesised results and thematic content analysis of the studies will be done by NVIVO version 10. We expect to find studies on barriers, challenges and facilitating factors of KMC utilisation by parents with LBWIs in LMICs. The review outcomes will guide future research and practice and inform policy. The findings will be disseminated in print, electronic and conference presentations related to maternal child and neonatal health.
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.

PubMed

Wang, Chunlin; Lefkowitz, Elliot J

2004-10-28

Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist.
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

PubMed Central

Wang, Chunlin; Lefkowitz, Elliot J

2004-01-01

Background Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. Results We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. Conclusions Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist. PMID:15511296
Large-scale feature searches of collections of medical imagery

NASA Astrophysics Data System (ADS)

Hedgcock, Marcus W.; Karshat, Walter B.; Levitt, Tod S.; Vosky, D. N.

1993-09-01

Large scale feature searches of accumulated collections of medical imagery are required for multiple purposes, including clinical studies, administrative planning, epidemiology, teaching, quality improvement, and research. To perform a feature search of large collections of medical imagery, one can either search text descriptors of the imagery in the collection (usually the interpretation), or (if the imagery is in digital format) the imagery itself. At our institution, text interpretations of medical imagery are all available in our VA Hospital Information System. These are downloaded daily into an off-line computer. The text descriptors of most medical imagery are usually formatted as free text, and so require a user friendly database search tool to make searches quick and easy for any user to design and execute. We are tailoring such a database search tool (Liveview), developed by one of the authors (Karshat). To further facilitate search construction, we are constructing (from our accumulated interpretation data) a dictionary of medical and radiological terms and synonyms. If the imagery database is digital, the imagery which the search discovers is easily retrieved from the computer archive. We describe our database search user interface, with examples, and compare the efficacy of computer assisted imagery searches from a clinical text database with manual searches. Our initial work on direct feature searches of digital medical imagery is outlined.
Citation searches are more sensitive than keyword searches to identify studies using specific measurement instruments.

PubMed

Linder, Suzanne K; Kamath, Geetanjali R; Pratt, Gregory F; Saraykar, Smita S; Volk, Robert J

2015-04-01

To compare the effectiveness of two search methods in identifying studies that used the Control Preferences Scale (CPS), a health care decision-making instrument commonly used in clinical settings. We searched the literature using two methods: (1) keyword searching using variations of "Control Preferences Scale" and (2) cited reference searching using two seminal CPS publications. We searched three bibliographic databases [PubMed, Scopus, and Web of Science (WOS)] and one full-text database (Google Scholar). We report precision and sensitivity as measures of effectiveness. Keyword searches in bibliographic databases yielded high average precision (90%) but low average sensitivity (16%). PubMed was the most precise, followed closely by Scopus and WOS. The Google Scholar keyword search had low precision (54%) but provided the highest sensitivity (70%). Cited reference searches in all databases yielded moderate sensitivity (45-54%), but precision ranged from 35% to 75% with Scopus being the most precise. Cited reference searches were more sensitive than keyword searches, making it a more comprehensive strategy to identify all studies that use a particular instrument. Keyword searches provide a quick way of finding some but not all relevant articles. Goals, time, and resources should dictate the combination of which methods and databases are used. Copyright © 2015 Elsevier Inc. All rights reserved.
Citation searches are more sensitive than keyword searches to identify studies using specific measurement instruments

PubMed Central

Linder, Suzanne K.; Kamath, Geetanjali R.; Pratt, Gregory F.; Saraykar, Smita S.; Volk, Robert J.

2015-01-01

Objective To compare the effectiveness of two search methods in identifying studies that used the Control Preferences Scale (CPS), a healthcare decision-making instrument commonly used in clinical settings. Study Design & Setting We searched the literature using two methods: 1) keyword searching using variations of “control preferences scale” and 2) cited reference searching using two seminal CPS publications. We searched three bibliographic databases [PubMed, Scopus, Web of Science (WOS)] and one full-text database (Google Scholar). We report precision and sensitivity as measures of effectiveness. Results Keyword searches in bibliographic databases yielded high average precision (90%), but low average sensitivity (16%). PubMed was the most precise, followed closely by Scopus and WOS. The Google Scholar keyword search had low precision (54%) but provided the highest sensitivity (70%). Cited reference searches in all databases yielded moderate sensitivity (45–54%), but precision ranged from 35–75% with Scopus being the most precise. Conclusion Cited reference searches were more sensitive than keyword searches, making it a more comprehensive strategy to identify all studies that use a particular instrument. Keyword searches provide a quick way of finding some but not all relevant articles. Goals, time and resources should dictate the combination of which methods and databases are used. PMID:25554521
HONselect: multilingual assistant search engine operated by a concept-based interface system to decentralized heterogeneous sources.

PubMed

Boyer, C; Baujard, V; Scherrer, J R

2001-01-01

Any new user to the Internet will think that to retrieve the relevant document is an easy task especially with the wealth of sources available on this medium, but this is not the case. Even experienced users have difficulty formulating the right query for making the most of a search tool in order to efficiently obtain an accurate result. The goal of this work is to reduce the time and the energy necessary in searching and locating medical and health information. To reach this goal we have developed HONselect [1]. The aim of HONselect is not only to improve efficiency in retrieving documents but to respond to an increased need for obtaining a selection of relevant and accurate documents from a breadth of various knowledge databases including scientific bibliographical references, clinical trials, daily news, multimedia illustrations, conferences, forum, Web sites, clinical cases, and others. The authors based their approach on the knowledge representation using the National Library of Medicine's Medical Subject Headings (NLM, MeSH) vocabulary and classification [2,3]. The innovation is to propose a multilingual "one-stop searching" (one Web interface to databases currently in English, French and German) with full navigational and connectivity capabilities. The user may choose from a given selection of related terms the one that best suit his search, navigate in the term's hierarchical tree, and access directly to a selection of documents from high quality knowledge suppliers such as the MEDLINE database, the NLM's ClinicalTrials.gov server, the NewsPage's daily news, the HON's media gallery, conference listings and MedHunt's Web sites [4, 5, 6, 7, 8, 9]. HONselect, developed by HON, a non-profit organisation [10], is a free online available multilingual tool based on the MeSH thesaurus to index, select, retrieve and display accurate, up to date, high-level and quality documents.
Creating a FIESTA (Framework for Integrated Earth Science and Technology Applications) with MagIC

NASA Astrophysics Data System (ADS)

Minnett, R.; Koppers, A. A. P.; Jarboe, N.; Tauxe, L.; Constable, C.

2017-12-01

The Magnetics Information Consortium (https://earthref.org/MagIC) has recently developed a containerized web application to considerably reduce the friction in contributing, exploring and combining valuable and complex datasets for the paleo-, geo- and rock magnetic scientific community. The data produced in this scientific domain are inherently hierarchical and the communities evolving approaches to this scientific workflow, from sampling to taking measurements to multiple levels of interpretations, require a large and flexible data model to adequately annotate the results and ensure reproducibility. Historically, contributing such detail in a consistent format has been prohibitively time consuming and often resulted in only publishing the highly derived interpretations. The new open-source (https://github.com/earthref/MagIC) application provides a flexible upload tool integrated with the data model to easily create a validated contribution and a powerful search interface for discovering datasets and combining them to enable transformative science. MagIC is hosted at EarthRef.org along with several interdisciplinary geoscience databases. A FIESTA (Framework for Integrated Earth Science and Technology Applications) is being created by generalizing MagIC's web application for reuse in other domains. The application relies on a single configuration document that describes the routing, data model, component settings and external services integrations. The container hosts an isomorphic Meteor JavaScript application, MongoDB database and ElasticSearch search engine. Multiple containers can be configured as microservices to serve portions of the application or rely on externally hosted MongoDB, ElasticSearch, or third-party services to efficiently scale computational demands. FIESTA is particularly well suited for many Earth Science disciplines with its flexible data model, mapping, account management, upload tool to private workspaces, reference metadata, image galleries, full text searches and detailed filters. EarthRef's Seamount Catalog of bathymetry and morphology data, EarthRef's Geochemical Earth Reference Model (GERM) databases, and Oregon State University's Marine and Geology Repository (http://osu-mgr.org) will benefit from custom adaptations of FIESTA.
VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases

NASA Technical Reports Server (NTRS)

Roussopoulos, N.; Sellis, Timos

1992-01-01

One of biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental database access method, VIEWCACHE, provides such an interface for accessing distributed data sets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image data sets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate distributed database search.
Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.

PubMed

Khennak, Ilyes; Drias, Habiba

2017-02-01

With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.
Measuring situation awareness in emergency settings: a systematic review of tools and outcomes

PubMed Central

Cooper, Simon; Porter, Joanne; Peach, Linda

2014-01-01

Background Nontechnical skills have an impact on health care outcomes and improve patient safety. Situation awareness is core with the view that an understanding of the environment will influence decision-making and performance. This paper reviews and describes indirect and direct measures of situation awareness applicable for emergency settings. Methods Electronic databases and search engines were searched from 1980 to 2010, including CINAHL, Ovid Medline, Pro-Quest, Cochrane, and the search engine, Google Scholar. Access strategies included keyword, author, and journal searches. Publications identified were assessed for relevance, and analyzed and synthesized using Oxford evidence levels and the Critical Appraisal Skills Programme guidelines in order to assess their quality and rigor. Results One hundred and thirteen papers were initially identified, and reduced to 55 following title and abstract review. The final selection included 14 papers drawn from the fields of emergency medicine, intensive care, anesthetics, and surgery. Ten of these discussed four general nontechnical skill measures (including situation awareness) and four incorporated the Situation Awareness Global Assessment Technique. Conclusion A range of direct and indirect techniques for measuring situation awareness is available. In the medical literature, indirect approaches are the most common, with situation awareness measured as part of a nontechnical skills assessment. In simulation-based studies, situation awareness in emergencies tends to be suboptimal, indicating the need for improved training techniques to enhance awareness and improve decision-making. PMID:27147872
Evaluation of Bar, Barnase, and Barstar recombinant proteins expressed in genetically engineered Brassica juncea (Indian mustard) for potential risks of food allergy using bioinformatics and literature searches.

PubMed

Siruguri, Vasanthi; Bharatraj, Dinesh Kumar; Vankudavath, Raju Naik; Mendu, Vishnu Vardhana Rao; Gupta, Vibha; Goodman, Richard E

2015-09-01

The potential allergenicity of Bar, Barnase, and Barstar recombinant proteins expressed in genetically engineered mustard for pollination control in plant breeding was evaluated for regulatory review. To evaluate the potential allergenicity of the Bar, Barnase and Barstar proteins amino acid sequence comparisons were made to those of known and putative allergens, and search for published evidence to the sources of the genes using the AllergenOnline.org database. Initial comparisons in 2012 were performed with version 12 by methods recommended by the Codex Alimentarius Commission and the Indian Council of Medical Research, Government of India. Searches were repeated with version 15 in 2015. A literature search was performed using PubMed to identify reports of allergy associated with the sources of the three transgenes. Potential open reading frames at the DNA insertion site were evaluated for matches to allergens. No significant sequence identity matches were identified with Bar, Barnase or Barstar proteins or potential fusion peptides at the genomic-insert junctions compared to known allergens. No references were identified that associated the sources of the genes with allergy. Based on these results we conclude that the Bar, Barnase and Barstar proteins are unlikely to present any significant risk of food allergy to consumers. Copyright © 2015 Elsevier Ltd. All rights reserved.
Automatically finding relevant citations for clinical guideline development.

PubMed

Bui, Duy Duc An; Jonnalagadda, Siddhartha; Del Fiol, Guilherme

2015-10-01

Literature database search is a crucial step in the development of clinical practice guidelines and systematic reviews. In the age of information technology, the process of literature search is still conducted manually, therefore it is costly, slow and subject to human errors. In this research, we sought to improve the traditional search approach using innovative query expansion and citation ranking approaches. We developed a citation retrieval system composed of query expansion and citation ranking methods. The methods are unsupervised and easily integrated over the PubMed search engine. To validate the system, we developed a gold standard consisting of citations that were systematically searched and screened to support the development of cardiovascular clinical practice guidelines. The expansion and ranking methods were evaluated separately and compared with baseline approaches. Compared with the baseline PubMed expansion, the query expansion algorithm improved recall (80.2% vs. 51.5%) with small loss on precision (0.4% vs. 0.6%). The algorithm could find all citations used to support a larger number of guideline recommendations than the baseline approach (64.5% vs. 37.2%, p<0.001). In addition, the citation ranking approach performed better than PubMed's "most recent" ranking (average precision +6.5%, recall@k +21.1%, p<0.001), PubMed's rank by "relevance" (average precision +6.1%, recall@k +14.8%, p<0.001), and the machine learning classifier that identifies scientifically sound studies from MEDLINE citations (average precision +4.9%, recall@k +4.2%, p<0.001). Our unsupervised query expansion and ranking techniques are more flexible and effective than PubMed's default search engine behavior and the machine learning classifier. Automated citation finding is promising to augment the traditional literature search. Copyright © 2015 Elsevier Inc. All rights reserved.
Evaluation of Federated Searching Options for the School Library

ERIC Educational Resources Information Center

Abercrombie, Sarah E.

2008-01-01

Three hosted federated search tools, Follett One Search, Gale PowerSearch Plus, and WebFeat Express, were configured and implemented in a school library. Databases from five vendors and the OPAC were systematically searched. Federated search results were compared with each other and to the results of the same searches in the database's native…
Expert searching in public health

PubMed Central

Alpi, Kristine M.

2005-01-01

Objective: The article explores the characteristics of public health information needs and the resources available to address those needs that distinguish it as an area of searching requiring particular expertise. Methods: Public health searching activities from reference questions and literature search requests at a large, urban health department library were reviewed to identify the challenges in finding relevant public health information. Results: The terminology of the information request frequently differed from the vocabularies available in the databases. Searches required the use of multiple databases and/or Web resources with diverse interfaces. Issues of the scope and features of the databases relevant to the search questions were considered. Conclusion: Expert searching in public health differs from other types of expert searching in the subject breadth and technical demands of the databases to be searched, the fluidity and lack of standardization of the vocabulary, and the relative scarcity of high-quality investigations at the appropriate level of geographic specificity. Health sciences librarians require a broad exposure to databases, gray literature, and public health terminology to perform as expert searchers in public health. PMID:15685281
Online Patent Searching: The Realities.

ERIC Educational Resources Information Center

Kaback, Stuart M.

1983-01-01

Considers patent subject searching capabilities of major online databases, noting patent claims, "deep-indexed" files, test searches, retrieval of related references, multi-database searching, improvements needed in indexing of chemical structures, full text searching, improvements needed in handling numerical data, and augmenting a…

Vitamin D and its relationship with breast cancer: an evidence based practice paper.

PubMed

Obaidi, Jawad; Musallam, Eyad; Al-Ghzawi, Hamzah Mohammad; Azzeghaiby, Saleh Nasser; Alzoghaibi, Ibrahim Nasir

2014-09-27

In oncology research fields, vitamin D has emerged as the most fruitful issue. The previous decade witnessed intensive efforts in connecting vitamin D with risk reduction and progression of various epithelial cancers, especially, breast cancer. To evaluate the relationship between vitamin D levels and breast cancer. A comprehensive search of several electronic databases was conducted in Pub Med, MEDLINE, CINAHL, in addition to, web search engine "Google" for abstracts, in order to determine the relationship between vitamin D and breast cancer. It was found that an increased serum level of vitamin D is associated with decreased risk of breast cancer. It was concluded that vitamin D plays a significant role in protection of breast cancer.
Using "Reader's Guide to Periodical Literature" on CD-Rom To Teach Database Searching to High School Students.

ERIC Educational Resources Information Center

Kern, Joanne F.

The lack of opportunity for high school sophomores to learn database searching was addressed by the implementation of a computerized magazine article search program. "Reader's Guide to Periodical Literature" on CD-ROM was used to train students in database searching during the time they were assigned to the library to do research papers…
Spiders and Worms and Crawlers, Oh My: Searching on the World Wide Web.

ERIC Educational Resources Information Center

Eagan, Ann; Bender, Laura

Searching on the world wide web can be confusing. A myriad of search engines exist, often with little or no documentation, and many of these search engines work differently from the standard search engines people are accustomed to using. Intended for librarians, this paper defines search engines, directories, spiders, and robots, and covers basics…
Dynamics of a macroscopic model characterizing mutualism of search engines and web sites

NASA Astrophysics Data System (ADS)

Wang, Yuanshi; Wu, Hong

2006-05-01

We present a model to describe the mutualism relationship between search engines and web sites. In the model, search engines and web sites benefit from each other while the search engines are derived products of the web sites and cannot survive independently. Our goal is to show strategies for the search engines to survive in the internet market. From mathematical analysis of the model, we show that mutualism does not always result in survival. We show various conditions under which the search engines would tend to extinction, persist or grow explosively. Then by the conditions, we deduce a series of strategies for the search engines to survive in the internet market. We present conditions under which the initial number of consumers of the search engines has little contribution to their persistence, which is in agreement with the results in previous works. Furthermore, we show novel conditions under which the initial value plays an important role in the persistence of the search engines and deduce new strategies. We also give suggestions for the web sites to cooperate with the search engines in order to form a win-win situation.
Application of kernel functions for accurate similarity search in large chemical databases.

PubMed

Wang, Xiaohong; Huan, Jun; Smalter, Aaron; Lushington, Gerald H

2010-04-29

Similarity search in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.
The impact of mHealth interventions on health systems: a systematic review protocol.

PubMed

Fortuin, Jill; Salie, Faatiema; Abdullahi, Leila H; Douglas, Tania S

2016-11-25

Mobile health (mHealth) has been described as a health enabling tool that impacts positively on the health system in terms of improved access, quality and cost of health care. The proposed systematic review will examine the impact of mHealth on health systems by assessing access, quality and cost of health care as indicators. The systematic review will include literature from various sources including published and unpublished/grey literature. The databases to be searched include: PubMed, Cochrane Library, Google Scholar, NHS Health Technology Assessment Database and Web of Science. The reference lists of studies will be screened and conference proceedings searched for additional eligible reports. Literature to be included will have mHealth as the primary intervention. Two authors will independently screen the search output, select studies and extract data; discrepancies will be resolved by consensus and discussion with the assistance of the third author. The systematic review will inform policy makers, investors, health professionals, technologists and engineers about the impact of mHealth in strengthening the health system. In particular, it will focus on three metrics to determine whether mHealth strengthens the health system, namely quality of, access to and cost of health care services. Systematic review registration: PROSPERO CRD42015026070.
Use of speech-to-text technology for documentation by healthcare providers.

PubMed

Ajami, Sima

2016-01-01

Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.
Risk factors, incidence, consequences and prevention strategies for falls and fall-injury within older indigenous populations: a systematic review.

PubMed

Lukaszyk, Caroline; Harvey, Lara; Sherrington, Cathie; Keay, Lisa; Tiedemann, Anne; Coombes, Julieann; Clemson, Lindy; Ivers, Rebecca

2016-12-01

To examine the risk factors, incidence, consequences and existing prevention strategies for falls and fall-related injury in older indigenous people. Relevant literature was identified through searching 14 electronic databases, a range of institutional websites, online search engines and government databases, using search terms pertaining to indigenous status, injury and ageing. Thirteen studies from Australia, the United States, Central America and Canada were identified. Few studies reported on fall rates but two reported that around 30% of indigenous people aged 45 years and above experienced at least one fall during the past year. The most common hospitalised fall injuries among older indigenous people were hip fracture and head injury. Risk factors significantly associated with falls within indigenous populations included poor mobility, a history of stroke, epilepsy, head injury, poor hearing and urinary incontinence. No formally evaluated, indigenous-specific fall prevention interventions were identified. Falls are a significant and growing health issue for older indigenous people worldwide that can lead to severe health consequences and even death. No fully-evaluated, indigenous-specific fall prevention programs were identified. Implications for Public Health: Research into fall patterns and fall-related injury among indigenous people is necessary for the development of appropriate fall prevention interventions. © 2016 Public Health Association of Australia.
Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

PubMed

Mackey, Aaron J; Pearson, William R

2004-10-01

Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.
DNA profiles, computer searches, and the Fourth Amendment.

PubMed

Kimel, Catherine W

2013-01-01

Pursuant to federal statutes and to laws in all fifty states, the United States government has assembled a database containing the DNA profiles of over eleven million citizens. Without judicial authorization, the government searches each of these profiles one-hundred thousand times every day, seeking to link database subjects to crimes they are not suspected of committing. Yet, courts and scholars that have addressed DNA databasing have focused their attention almost exclusively on the constitutionality of the government's seizure of the biological samples from which the profiles are generated. This Note fills a gap in the scholarship by examining the Fourth Amendment problems that arise when the government searches its vast DNA database. This Note argues that each attempt to match two DNA profiles constitutes a Fourth Amendment search because each attempted match infringes upon database subjects' expectations of privacy in their biological relationships and physical movements. The Note further argues that database searches are unreasonable as they are currently conducted, and it suggests an adaptation of computer-search procedures to remedy the constitutional deficiency.
ESTree db: a Tool for Peach Functional Genomics

PubMed Central

Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

2005-01-01

Background The ESTree db represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. Results The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. Conclusion The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig. PMID:16351742
ESTree db: a tool for peach functional genomics.

PubMed

Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

2005-12-01

The ESTree db http://www.itb.cnr.it/estree/ represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig.
Conceptualizing belonging.

PubMed

Mahar, Alyson L; Cobigo, Virginie; Stuart, Heather

2013-06-01

To develop a transdisciplinary conceptualization of social belonging that could be used to guide measurement approaches aimed at evaluating the effectiveness of community-based programs for people with disabilities. We conducted a narrative, scoping review of peer reviewed English language literature published between 1990 and July 2011 using multiple databases, with "sense of belonging" as a key search term. The search engine ranked articles for relevance to the search strategy. Articles were searched in order until theoretical saturation was reached. We augmented this search strategy by reviewing reference lists of relevant papers. Theoretical saturation was reached after 40 articles; 22 of which were qualitative accounts. We identified five intersecting themes: subjectivity; groundedness to an external referent; reciprocity; dynamism and self-determination. We define a sense of belonging as a subjective feeling of value and respect derived from a reciprocal relationship to an external referent that is built on a foundation of shared experiences, beliefs or personal characteristics. These feelings of external connectedness are grounded to the context or referent group, to whom one chooses, wants and feels permission to belong. This dynamic phenomenon may be either hindered or promoted by complex interactions between environmental and personal factors.
Utilization of a radiology-centric search engine.

PubMed

Sharpe, Richard E; Sharpe, Megan; Siegel, Eliot; Siddiqui, Khan

2010-04-01

Internet-based search engines have become a significant component of medical practice. Physicians increasingly rely on information available from search engines as a means to improve patient care, provide better education, and enhance research. Specialized search engines have emerged to more efficiently meet the needs of physicians. Details about the ways in which radiologists utilize search engines have not been documented. The authors categorized every 25th search query in a radiology-centric vertical search engine by radiologic subspecialty, imaging modality, geographic location of access, time of day, use of abbreviations, misspellings, and search language. Musculoskeletal and neurologic imagings were the most frequently searched subspecialties. The least frequently searched were breast imaging, pediatric imaging, and nuclear medicine. Magnetic resonance imaging and computed tomography were the most frequently searched modalities. A majority of searches were initiated in North America, but all continents were represented. Searches occurred 24 h/day in converted local times, with a majority occurring during the normal business day. Misspellings and abbreviations were common. Almost all searches were performed in English. Search engine utilization trends are likely to mirror trends in diagnostic imaging in the region from which searches originate. Internet searching appears to function as a real-time clinical decision-making tool, a research tool, and an educational resource. A more thorough understanding of search utilization patterns can be obtained by analyzing phrases as actually entered as well as the geographic location and time of origination. This knowledge may contribute to the development of more efficient and personalized search engines.
Chapter 51: How to Build a Simple Cone Search Service Using a Local Database

NASA Astrophysics Data System (ADS)

Kent, B. R.; Greene, G. R.

The cone search service protocol will be examined from the server side in this chapter. A simple cone search service will be setup and configured locally using MySQL. Data will be read into a table, and the Java JDBC will be used to connect to the database. Readers will understand the VO cone search specification and how to use it to query a database on their local systems and return an XML/VOTable file based on an input of RA/DEC coordinates and a search radius. The cone search in this example will be deployed as a Java servlet. The resulting cone search can be tested with a verification service. This basic setup can be used with other languages and relational databases.
muBLASTP: database-indexed protein sequence search on multicore CPUs.

PubMed

Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun

2016-11-04

The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.
An Exploratory Survey of Student Perspectives Regarding Search Engines

ERIC Educational Resources Information Center

Alshare, Khaled; Miller, Don; Wenger, James

2005-01-01

This study explored college students' perceptions regarding their use of search engines. The main objective was to determine how frequently students used various search engines, whether advanced search features were used, and how many search engines were used. Various factors that might influence student responses were examined. Results showed…
The Use of Web Search Engines in Information Science Research.

ERIC Educational Resources Information Center

Bar-Ilan, Judit

2004-01-01

Reviews the literature on the use of Web search engines in information science research, including: ways users interact with Web search engines; social aspects of searching; structure and dynamic nature of the Web; link analysis; other bibliometric applications; characterizing information on the Web; search engine evaluation and improvement; and…
Characterizing Interdisciplinarity of Researchers and Research Topics Using Web Search Engines

PubMed Central

Sayama, Hiroki; Akaishi, Jin

2012-01-01

Researchers' networks have been subject to active modeling and analysis. Earlier literature mostly focused on citation or co-authorship networks reconstructed from annotated scientific publication databases, which have several limitations. Recently, general-purpose web search engines have also been utilized to collect information about social networks. Here we reconstructed, using web search engines, a network representing the relatedness of researchers to their peers as well as to various research topics. Relatedness between researchers and research topics was characterized by visibility boost—increase of a researcher's visibility by focusing on a particular topic. It was observed that researchers who had high visibility boosts by the same research topic tended to be close to each other in their network. We calculated correlations between visibility boosts by research topics and researchers' interdisciplinarity at the individual level (diversity of topics related to the researcher) and at the social level (his/her centrality in the researchers' network). We found that visibility boosts by certain research topics were positively correlated with researchers' individual-level interdisciplinarity despite their negative correlations with the general popularity of researchers. It was also found that visibility boosts by network-related topics had positive correlations with researchers' social-level interdisciplinarity. Research topics' correlations with researchers' individual- and social-level interdisciplinarities were found to be nearly independent from each other. These findings suggest that the notion of “interdisciplinarity" of a researcher should be understood as a multi-dimensional concept that should be evaluated using multiple assessment means. PMID:22719935
Characterizing interdisciplinarity of researchers and research topics using web search engines.

PubMed

Sayama, Hiroki; Akaishi, Jin

2012-01-01

Researchers' networks have been subject to active modeling and analysis. Earlier literature mostly focused on citation or co-authorship networks reconstructed from annotated scientific publication databases, which have several limitations. Recently, general-purpose web search engines have also been utilized to collect information about social networks. Here we reconstructed, using web search engines, a network representing the relatedness of researchers to their peers as well as to various research topics. Relatedness between researchers and research topics was characterized by visibility boost-increase of a researcher's visibility by focusing on a particular topic. It was observed that researchers who had high visibility boosts by the same research topic tended to be close to each other in their network. We calculated correlations between visibility boosts by research topics and researchers' interdisciplinarity at the individual level (diversity of topics related to the researcher) and at the social level (his/her centrality in the researchers' network). We found that visibility boosts by certain research topics were positively correlated with researchers' individual-level interdisciplinarity despite their negative correlations with the general popularity of researchers. It was also found that visibility boosts by network-related topics had positive correlations with researchers' social-level interdisciplinarity. Research topics' correlations with researchers' individual- and social-level interdisciplinarities were found to be nearly independent from each other. These findings suggest that the notion of "interdisciplinarity" of a researcher should be understood as a multi-dimensional concept that should be evaluated using multiple assessment means.

Using Internet Search Engines to Obtain Medical Information: A Comparative Study

PubMed Central

Wang, Liupu; Wang, Juexin; Wang, Michael; Li, Yong; Liang, Yanchun

2012-01-01

Background The Internet has become one of the most important means to obtain health and medical information. It is often the first step in checking for basic information about a disease and its treatment. The search results are often useful to general users. Various search engines such as Google, Yahoo!, Bing, and Ask.com can play an important role in obtaining medical information for both medical professionals and lay people. However, the usability and effectiveness of various search engines for medical information have not been comprehensively compared and evaluated. Objective To compare major Internet search engines in their usability of obtaining medical and health information. Methods We applied usability testing as a software engineering technique and a standard industry practice to compare the four major search engines (Google, Yahoo!, Bing, and Ask.com) in obtaining health and medical information. For this purpose, we searched the keyword breast cancer in Google, Yahoo!, Bing, and Ask.com and saved the results of the top 200 links from each search engine. We combined nonredundant links from the four search engines and gave them to volunteer users in an alphabetical order. The volunteer users evaluated the websites and scored each website from 0 to 10 (lowest to highest) based on the usefulness of the content relevant to breast cancer. A medical expert identified six well-known websites related to breast cancer in advance as standards. We also used five keywords associated with breast cancer defined in the latest release of Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and analyzed their occurrence in the websites. Results Each search engine provided rich information related to breast cancer in the search results. All six standard websites were among the top 30 in search results of all four search engines. Google had the best search validity (in terms of whether a website could be opened), followed by Bing, Ask.com, and Yahoo!. The search results highly overlapped between the search engines, and the overlap between any two search engines was about half or more. On the other hand, each search engine emphasized various types of content differently. In terms of user satisfaction analysis, volunteer users scored Bing the highest for its usefulness, followed by Yahoo!, Google, and Ask.com. Conclusions Google, Yahoo!, Bing, and Ask.com are by and large effective search engines for helping lay users get health and medical information. Nevertheless, the current ranking methods have some pitfalls and there is room for improvement to help users get more accurate and useful information. We suggest that search engine users explore multiple search engines to search different types of health information and medical knowledge for their own needs and get a professional consultation if necessary. PMID:22672889
Using Internet search engines to obtain medical information: a comparative study.

PubMed

Wang, Liupu; Wang, Juexin; Wang, Michael; Li, Yong; Liang, Yanchun; Xu, Dong

2012-05-16

The Internet has become one of the most important means to obtain health and medical information. It is often the first step in checking for basic information about a disease and its treatment. The search results are often useful to general users. Various search engines such as Google, Yahoo!, Bing, and Ask.com can play an important role in obtaining medical information for both medical professionals and lay people. However, the usability and effectiveness of various search engines for medical information have not been comprehensively compared and evaluated. To compare major Internet search engines in their usability of obtaining medical and health information. We applied usability testing as a software engineering technique and a standard industry practice to compare the four major search engines (Google, Yahoo!, Bing, and Ask.com) in obtaining health and medical information. For this purpose, we searched the keyword breast cancer in Google, Yahoo!, Bing, and Ask.com and saved the results of the top 200 links from each search engine. We combined nonredundant links from the four search engines and gave them to volunteer users in an alphabetical order. The volunteer users evaluated the websites and scored each website from 0 to 10 (lowest to highest) based on the usefulness of the content relevant to breast cancer. A medical expert identified six well-known websites related to breast cancer in advance as standards. We also used five keywords associated with breast cancer defined in the latest release of Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and analyzed their occurrence in the websites. Each search engine provided rich information related to breast cancer in the search results. All six standard websites were among the top 30 in search results of all four search engines. Google had the best search validity (in terms of whether a website could be opened), followed by Bing, Ask.com, and Yahoo!. The search results highly overlapped between the search engines, and the overlap between any two search engines was about half or more. On the other hand, each search engine emphasized various types of content differently. In terms of user satisfaction analysis, volunteer users scored Bing the highest for its usefulness, followed by Yahoo!, Google, and Ask.com. Google, Yahoo!, Bing, and Ask.com are by and large effective search engines for helping lay users get health and medical information. Nevertheless, the current ranking methods have some pitfalls and there is room for improvement to help users get more accurate and useful information. We suggest that search engine users explore multiple search engines to search different types of health information and medical knowledge for their own needs and get a professional consultation if necessary.
NASA Indexing Benchmarks: Evaluating Text Search Engines

NASA Technical Reports Server (NTRS)

Esler, Sandra L.; Nelson, Michael L.

1997-01-01

The current proliferation of on-line information resources underscores the requirement for the ability to index collections of information and search and retrieve them in a convenient manner. This study develops criteria for analytically comparing the index and search engines and presents results for a number of freely available search engines. A product of this research is a toolkit capable of automatically indexing, searching, and extracting performance statistics from each of the focused search engines. This toolkit is highly configurable and has the ability to run these benchmark tests against other engines as well. Results demonstrate that the tested search engines can be grouped into two levels. Level one engines are efficient on small to medium sized data collections, but show weaknesses when used for collections 100MB or larger. Level two search engines are recommended for data collections up to and beyond 100MB.
Alternative Databases for Anthropology Searching.

ERIC Educational Resources Information Center

Brody, Fern; Lambert, Maureen

1984-01-01

Examines online search results of sample questions in several databases covering linguistics, cultural anthropology, and physical anthropology in order to determine if and where any overlap in results might occur, and which files have greatest number of relevant hits. Search results by database are given for each subject area. (EJS)
Searching for a New Way to Reach Patrons: A Search Engine Optimization Pilot Project at Binghamton University Libraries

ERIC Educational Resources Information Center

Rushton, Erin E.; Kelehan, Martha Daisy; Strong, Marcy A.

2008-01-01

Search engine use is one of the most popular online activities. According to a recent OCLC report, nearly all students start their electronic research using a search engine instead of the library Web site. Instead of viewing search engines as competition, however, librarians at Binghamton University Libraries decided to employ search engine…
When is a search not a search? A comparison of searching the AMED complementary health database via EBSCOhost, OVID and DIALOG.

PubMed

Younger, Paula; Boddy, Kate

2009-06-01

The researchers involved in this study work at Exeter Health library and at the Complementary Medicine Unit, Peninsula School of Medicine and Dentistry (PCMD). Within this collaborative environment it is possible to access the electronic resources of three institutions. This includes access to AMED and other databases using different interfaces. The aim of this study was to investigate whether searching different interfaces to the AMED allied health and complementary medicine database produced the same results when using identical search terms. The following Internet-based AMED interfaces were searched: DIALOG DataStar; EBSCOhost and OVID SP_UI01.00.02. Search results from all three databases were saved in an endnote database to facilitate analysis. A checklist was also compiled comparing interface features. In our initial search, DIALOG returned 29 hits, OVID 14 and Ebsco 8. If we assume that DIALOG returned 100% of potential hits, OVID initially returned only 48% of hits and EBSCOhost only 28%. In our search, a researcher using the Ebsco interface to carry out a simple search on AMED would miss over 70% of possible search hits. Subsequent EBSCOhost searches on different subjects failed to find between 21 and 86% of the hits retrieved using the same keywords via DIALOG DataStar. In two cases, the simple EBSCOhost search failed to find any of the results found via DIALOG DataStar. Depending on the interface, the number of hits retrieved from the same database with the same simple search can vary dramatically. Some simple searches fail to retrieve a substantial percentage of citations. This may result in an uninformed literature review, research funding application or treatment intervention. In addition to ensuring that keywords, spelling and medical subject headings (MeSH) accurately reflect the nature of the search, database users should include wildcards and truncation and adapt their search strategy substantially to retrieve the maximum number of appropriate citations possible. Librarians should be aware of these differences when making purchasing decisions, carrying out literature searches and planning user education.
Teen smoking cessation help via the Internet: a survey of search engines.

PubMed

Edwards, Christine C; Elliott, Sean P; Conway, Terry L; Woodruff, Susan I

2003-07-01

The objective of this study was to assess Web sites related to teen smoking cessation on the Internet. Seven Internet search engines were searched using the keywords teen quit smoking. The top 20 hits from each search engine were reviewed and categorized. The keywords teen quit smoking produced between 35 and 400,000 hits depending on the search engine. Of 140 potential hits, 62% were active, unique sites; 85% were listed by only one search engine; and 40% focused on cessation. Findings suggest that legitimate on-line smoking cessation help for teens is constrained by search engine choice and the amount of time teens spend looking through potential sites. Resource listings should be updated regularly. Smoking cessation Web sites need to be picked up on multiple search engine searches. Further evaluation of smoking cessation Web sites need to be conducted to identify the most effective help for teens.
Multi-mode acquisition (MMA): An MS/MS acquisition strategy for maximizing selectivity, specificity and sensitivity of DIA product ion spectra.

PubMed

Williams, Brad J; Ciavarini, Steve J; Devlin, Curt; Cohn, Steven M; Xie, Rong; Vissers, Johannes P C; Martin, LeRoy B; Caswell, Allen; Langridge, James I; Geromanos, Scott J

2016-08-01

In proteomics studies, it is generally accepted that depth of coverage and dynamic range is limited in data-directed acquisitions. The serial nature of the method limits both sensitivity and the number of precursor ions that can be sampled. To that end, a number of data-independent acquisition (DIA) strategies have been introduced with these methods, for the most part, immune to the sampling issue; nevertheless, some do have other limitations with respect to sensitivity. The major limitation with DIA approaches is interference, i.e., MS/MS spectra are highly chimeric and often incapable of being identified using conventional database search engines. Utilizing each available dimension of separation prior to ion detection, we present a new multi-mode acquisition (MMA) strategy multiplexing both narrowband and wideband DIA acquisitions in a single analytical workflow. The iterative nature of the MMA workflow limits the adverse effects of interference with minimal loss in sensitivity. Qualitative identification can be performed by selected ion chromatograms or conventional database search strategies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Risk Factors and Protective Factors for Lower-Extremity Running Injuries A Systematic Review.

PubMed

Gijon-Nogueron, Gabriel; Fernandez-Villarejo, Marina

2015-11-01

A review of the scientific literature was performed 1) to identify studies describing the most common running injuries and their relation to the risk factors that produce them and 2) to search for potential and specific protective factors. Spanish and English biomedical search engines and databases (MEDLINE/PubMed, Database Enfermería Fisioterapia Podología [ENFISPO], Cochrane Library, and Cumulative Index to Nursing and Allied Health Literature) were queried (February 1 to November 30, 2013). A critical reading and assessment was then performed by the Critical Appraisal Skills Programme Spanish tool. In total, 276 abstracts that contained the selected key words were found. Of those, 25 identified and analyzed articles were included in the results. Injuries result from inadequate interaction between the runner's biomechanics and external factors. This leads to an excessive accumulation of impact peak forces in certain structures that tends to cause overuse injuries. The main reasons are inadequate muscle stabilization and pronation. These vary depending on the runner's foot strike pattern, foot arch morphology, and sex. Specific measures of modification and control through running footwear are proposed.
[Development of domain specific search engines].

PubMed

Takai, T; Tokunaga, M; Maeda, K; Kaminuma, T

2000-01-01

As cyber space exploding in a pace that nobody has ever imagined, it becomes very important to search cyber space efficiently and effectively. One solution to this problem is search engines. Already a lot of commercial search engines have been put on the market. However these search engines respond with such cumbersome results that domain specific experts can not tolerate. Using a dedicate hardware and a commercial software called OpenText, we have tried to develop several domain specific search engines. These engines are for our institute's Web contents, drugs, chemical safety, endocrine disruptors, and emergent response for chemical hazard. These engines have been on our Web site for testing.
VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases

NASA Technical Reports Server (NTRS)

Roussopoulos, N.; Sellis, Timos

1993-01-01

One of the biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental data base access method, VIEWCACHE, provides such an interface for accessing distributed datasets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image datasets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate database search.
A unified architecture for biomedical search engines based on semantic web technologies.

PubMed

Jalali, Vahid; Matash Borujerdi, Mohammad Reza

2011-04-01

There is a huge growth in the volume of published biomedical research in recent years. Many medical search engines are designed and developed to address the over growing information needs of biomedical experts and curators. Significant progress has been made in utilizing the knowledge embedded in medical ontologies and controlled vocabularies to assist these engines. However, the lack of common architecture for utilized ontologies and overall retrieval process, hampers evaluating different search engines and interoperability between them under unified conditions. In this paper, a unified architecture for medical search engines is introduced. Proposed model contains standard schemas declared in semantic web languages for ontologies and documents used by search engines. Unified models for annotation and retrieval processes are other parts of introduced architecture. A sample search engine is also designed and implemented based on the proposed architecture in this paper. The search engine is evaluated using two test collections and results are reported in terms of precision vs. recall and mean average precision for different approaches used by this search engine.
A Literature Review of Indexing and Searching Techniques Implementation in Educational Search Engines

ERIC Educational Resources Information Center

El Guemmat, Kamal; Ouahabi, Sara

2018-01-01

The objective of this article is to analyze the searching and indexing techniques of educational search engines' implementation while treating future challenges. Educational search engines could greatly help in the effectiveness of e-learning if used correctly. However, these engines have several gaps which influence the performance of e-learning…
Drexel at TREC 2014 Federated Web Search Track

DTIC Science & Technology

2014-11-01

of its input RS results. 1. INTRODUCTION Federated Web Search is the task of searching multiple search engines simultaneously and combining their...or distributed properly[5]. The goal of RS is then, for a given query, to select only the most promising search engines from all those available. Most...result pages of 149 search engines . 4000 queries are used in building the sample set. As a part of the Vertical Selection task, search engines are
Database Searching by Managers.

ERIC Educational Resources Information Center

Arnold, Stephen E.

Managers and executives need the easy and quick access to business and management information that online databases can provide, but many have difficulty articulating their search needs to an intermediary. One possible solution would be to encourage managers and their immediate support staff members to search textual databases directly as they now…
Constructing Effective Search Strategies for Electronic Searching.

ERIC Educational Resources Information Center

Flanagan, Lynn; Parente, Sharon Campbell

Electronic databases have grown tremendously in both number and popularity since their development during the 1960s. Access to electronic databases in academic libraries was originally offered primarily through mediated search services by trained librarians; however, the advent of CD-ROM and end-user interfaces for online databases has shifted the…
Subject searching of monographs online in the medical literature.

PubMed

Brahmi, F A

1988-01-01

Searching by subject for monographic information online in the medical literature is a challenging task. The NLM database of choice is CATLINE. Other NLM databases of interest are BIOTHICSLINE, CANCERLIT, HEALTH, POPLINE, and TOXLINE. Ten BRS databases are also discussed. Of these, Books in Print, Bookinfo, and OCLC are explored further. The databases are compared as to number of total records and number and percentage of monographs. Three topics were searched on CROSS to compare hits on BBIP, BOOK, and OCLC. The same searches were run on CATLINE. The parameters of time coverage and language were equalized and the resulting citations were compared and analyzed for duplication and uniqueness. With the input of CATLINE tapes into OCLC, OCLC has become the database of choice for searching by subject for medical monographs.
mTM-align: a server for fast protein structure database search and multiple protein structure alignment.

PubMed

Dong, Runze; Pan, Shuo; Peng, Zhenling; Zhang, Yang; Yang, Jianyi

2018-05-21

With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of two closely related modules: one for structure database search and the other for multiple structure alignment. The database search is speeded up based on a heuristic algorithm and a hierarchical organization of the structures in the database. The multiple structure alignment is performed using the recently developed algorithm mTM-align. Benchmark tests demonstrate that our algorithms outperform other peering methods for both modules, in terms of speed and accuracy. One of the unique features for the server is the interplay between database search and multiple structure alignment. The server provides service not only for performing fast database search, but also for making accurate multiple structure alignment with the structures found by the search. For the database search, it takes about 2-5 min for a structure of a medium size (∼300 residues). For the multiple structure alignment, it takes a few seconds for ∼10 structures of medium sizes. The server is freely available at: http://yanglab.nankai.edu.cn/mTM-align/.
A Search Relevance Algorithm for Weather Effects Products

DTIC Science & Technology

2006-12-29

accessed) are often search engines [4] [5]. This suggests that people are navigating the internet by searching and not through the traditional...geographic location. Unlike traditional search engines a Federated Search Engine does not scour all the data available and return matches. Instead...gold standard in search engines . However, its ranking system is based, largely, on a measure of interconnectedness. A page that is referenced more
Search Fermilab Plant Database

Science.gov Websites

Select the characteristics of the plant you want to find below and click the Search button. To see Plants to see all the prairie plants in the database. Click Search All Plants at Fermilab to search for reflects observations at Fermilab. If you need a more sophisticated search, try the Advanced Search. Search

Reducing process delays for real-time earthquake parameter estimation - An application of KD tree to large databases for Earthquake Early Warning

NASA Astrophysics Data System (ADS)

Yin, Lucy; Andrews, Jennifer; Heaton, Thomas

2018-05-01

Earthquake parameter estimations using nearest neighbor searching among a large database of observations can lead to reliable prediction results. However, in the real-time application of Earthquake Early Warning (EEW) systems, the accurate prediction using a large database is penalized by a significant delay in the processing time. We propose to use a multidimensional binary search tree (KD tree) data structure to organize large seismic databases to reduce the processing time in nearest neighbor search for predictions. We evaluated the performance of KD tree on the Gutenberg Algorithm, a database-searching algorithm for EEW. We constructed an offline test to predict peak ground motions using a database with feature sets of waveform filter-bank characteristics, and compare the results with the observed seismic parameters. We concluded that large database provides more accurate predictions of the ground motion information, such as peak ground acceleration, velocity, and displacement (PGA, PGV, PGD), than source parameters, such as hypocenter distance. Application of the KD tree search to organize the database reduced the average searching process by 85% time cost of the exhaustive method, allowing the method to be feasible for real-time implementation. The algorithm is straightforward and the results will reduce the overall time of warning delivery for EEW.
StemTextSearch: Stem cell gene database with evidence from abstracts.

PubMed

Chen, Chou-Cheng; Ho, Chung-Liang

2017-05-01

Previous studies have used many methods to find biomarkers in stem cells, including text mining, experimental data and image storage. However, no text-mining methods have yet been developed which can identify whether a gene plays a positive or negative role in stem cells. StemTextSearch identifies the role of a gene in stem cells by using a text-mining method to find combinations of gene regulation, stem-cell regulation and cell processes in the same sentences of biomedical abstracts. The dataset includes 5797 genes, with 1534 genes having positive roles in stem cells, 1335 genes having negative roles, 1654 genes with both positive and negative roles, and 1274 with an uncertain role. The precision of gene role in StemTextSearch is 0.66, and the recall is 0.78. StemTextSearch is a web-based engine with queries that specify (i) gene, (ii) category of stem cell, (iii) gene role, (iv) gene regulation, (v) cell process, (vi) stem-cell regulation, and (vii) species. StemTextSearch is available through http://bio.yungyun.com.tw/StemTextSearch.aspx. Copyright © 2017. Published by Elsevier Inc.
Effectiveness of training intervention to improve medical student’s information literacy skills

PubMed Central

2016-01-01

This study aimed to assess the efficiency of delivering a 4-month course of “effective literature search” among medical postgraduate students for improving information literacy skills. This was a cross-sectional study in which 90 postgraduate students were randomly selected and participated in 12 training sessions. Effective search strategies were presented and the students’ attitude and competency concerning online search were measured by a pre- and post-questionnaires and skill tests. Data were analyzed by SPSS version 16 using t-test. There was a significant improvement (p=0.00), in student’s attitude. The mean (standard deviation [SD]) was 2.9 (0.8) before intervention versus the mean (SD) 3.9 (0.7) after intervention. Students’ familiarity with medical resources and databases improved significantly. The data showed a significant increase (p=0.03), in students’ competency score concerning search strategy design and conducting a search. The mean (SD) was 2.04 (0.7) before intervention versus the mean (SD) 3.07 (0.8) after intervention. Also, students’ ability in applying search and meta search engine improved significantly. This study clearly acknowledges that the training intervention provides considerable opportunity to improve medical student’s information literacy skills. PMID:27907985
Searching for grey literature for systematic reviews: challenges and benefits.

PubMed

Mahood, Quenby; Van Eerd, Dwayne; Irvin, Emma

2014-09-01

There is ongoing interest in including grey literature in systematic reviews. Including grey literature can broaden the scope to more relevant studies, thereby providing a more complete view of available evidence. Searching for grey literature can be challenging despite greater access through the Internet, search engines and online bibliographic databases. There are a number of publications that list sources for finding grey literature in systematic reviews. However, there is scant information about how searches for grey literature are executed and how it is included in the review process. This level of detail is important to ensure that reviews follow explicit methodology to be systematic, transparent and reproducible. The purpose of this paper is to provide a detailed account of one systematic review team's experience in searching for grey literature and including it throughout the review. We provide a brief overview of grey literature before describing our search and review approach. We also discuss the benefits and challenges of including grey literature in our systematic review, as well as the strengths and limitations to our approach. Detailed information about incorporating grey literature in reviews is important in advancing methodology as review teams adapt and build upon the approaches described. Copyright © 2013 John Wiley & Sons, Ltd.
RadSearch: a RIS/PACS integrated query tool

NASA Astrophysics Data System (ADS)

Tsao, Sinchai; Documet, Jorge; Moin, Paymann; Wang, Kevin; Liu, Brent J.

2008-03-01

Radiology Information Systems (RIS) contain a wealth of information that can be used for research, education, and practice management. However, the sheer amount of information available makes querying specific data difficult and time consuming. Previous work has shown that a clinical RIS database and its RIS text reports can be extracted, duplicated and indexed for searches while complying with HIPAA and IRB requirements. This project's intent is to provide a software tool, the RadSearch Toolkit, to allow intelligent indexing and parsing of RIS reports for easy yet powerful searches. In addition, the project aims to seamlessly query and retrieve associated images from the Picture Archiving and Communication System (PACS) in situations where an integrated RIS/PACS is in place - even subselecting individual series, such as in an MRI study. RadSearch's application of simple text parsing techniques to index text-based radiology reports will allow the search engine to quickly return relevant results. This powerful combination will be useful in both private practice and academic settings; administrators can easily obtain complex practice management information such as referral patterns; researchers can conduct retrospective studies with specific, multiple criteria; teaching institutions can quickly and effectively create thorough teaching files.
Interactive searching of facial image databases

NASA Astrophysics Data System (ADS)

Nicholls, Robert A.; Shepherd, John W.; Shepherd, Jean

1995-09-01

A set of psychological facial descriptors has been devised to enable computerized searching of criminal photograph albums. The descriptors have been used to encode image databased of up to twelve thousand images. Using a system called FACES, the databases are searched by translating a witness' verbal description into corresponding facial descriptors. Trials of FACES have shown that this coding scheme is more productive and efficient than searching traditional photograph albums. An alternative method of searching the encoded database using a genetic algorithm is currenly being tested. The genetic search method does not require the witness to verbalize a description of the target but merely to indicate a degree of similarity between the target and a limited selection of images from the database. The major drawback of FACES is that is requires a manual encoding of images. Research is being undertaken to automate the process, however, it will require an algorithm which can predict human descriptive values. Alternatives to human derived coding schemes exist using statistical classifications of images. Since databases encoded using statistical classifiers do not have an obvious direct mapping to human derived descriptors, a search method which does not require the entry of human descriptors is required. A genetic search algorithm is being tested for such a purpose.
Current Searching Methodology and Retrieval Issues: An Assessment

DTIC Science & Technology

2008-03-01

searching that are used by search engines are discussed. They are: full text searching, i.e., the searching of unstructured data, and metadata searching...also found among search engines ; however, it is the popularity of full text searching that has changed the road map to information access. The...other hand, information seekers’ willingness, or lack of, to learn the multiple search engines ’ capabilities may diminish their search results
DTS: Building custom, intelligent schedulers

NASA Technical Reports Server (NTRS)

Hansson, Othar; Mayer, Andrew

1994-01-01

DTS is a decision-theoretic scheduler, built on top of a flexible toolkit -- this paper focuses on how the toolkit might be reused in future NASA mission schedulers. The toolkit includes a user-customizable scheduling interface, and a 'Just-For-You' optimization engine. The customizable interface is built on two metaphors: objects and dynamic graphs. Objects help to structure problem specifications and related data, while dynamic graphs simplify the specification of graphical schedule editors (such as Gantt charts). The interface can be used with any 'back-end' scheduler, through dynamically-loaded code, interprocess communication, or a shared database. The 'Just-For-You' optimization engine includes user-specific utility functions, automatically compiled heuristic evaluations, and a postprocessing facility for enforcing scheduling policies. The optimization engine is based on BPS, the Bayesian Problem-Solver (1,2), which introduced a similar approach to solving single-agent and adversarial graph search problems.
Searching Harvard Business Review Online. . . Lessons in Searching a Full Text Database.

ERIC Educational Resources Information Center

Tenopir, Carol

1985-01-01

This article examines the Harvard Business Review Online (HBRO) database (bibliographic description fields, abstracts, extracted information, full text, subject descriptors) and reports on 31 sample HBRO searches conducted in Bibliographic Retrieval Services to test differences between searching full text and searching bibliographic record. Sample…
Bibliography on propulsion airframe integration technologies for high-speed civil transport applications, 1980-1991

NASA Technical Reports Server (NTRS)

Anderson, David J.; Mizukami, Masashi

1993-01-01

NASA has initiated the High Speed Research (HSR) program with the goal to develop technologies for a new generation, economically viable, environmentally acceptable, supersonic transport (SST) called the High Speed Civil Transport (HSCT). A significant part of this effort is expected to be in multidisciplinary systems integration, such as in propulsion airframe integration (PAI). In order to assimilate the knowledge database on PAI for SST type aircraft, a bibliography on this subject was compiled. The bibliography with over 1200 entries, full abstracts, and indexes. Related topics are also covered, such as the following: engine inlets, engine cycles, nozzles, existing supersonic cruise aircraft, noise issues, computational fluid dynamics, aerodynamics, and external interference. All identified documents from 1980 through early 1991 are included; this covers the latter part of the NASA Supersonic Cruise Research (SCR) program and the beginnings of the HSR program. In addition, some pre-1980 documents of significant merit or reference value are also included. The references were retrieved via a computerized literature search using the NASA RECON database system.
In search of the emotional face: anger versus happiness superiority in visual search.

PubMed

Savage, Ruth A; Lipp, Ottmar V; Craig, Belinda M; Becker, Stefanie I; Horstmann, Gernot

2013-08-01

Previous research has provided inconsistent results regarding visual search for emotional faces, yielding evidence for either anger superiority (i.e., more efficient search for angry faces) or happiness superiority effects (i.e., more efficient search for happy faces), suggesting that these results do not reflect on emotional expression, but on emotion (un-)related low-level perceptual features. The present study investigated possible factors mediating anger/happiness superiority effects; specifically search strategy (fixed vs. variable target search; Experiment 1), stimulus choice (Nimstim database vs. Ekman & Friesen database; Experiments 1 and 2), and emotional intensity (Experiment 3 and 3a). Angry faces were found faster than happy faces regardless of search strategy using faces from the Nimstim database (Experiment 1). By contrast, a happiness superiority effect was evident in Experiment 2 when using faces from the Ekman and Friesen database. Experiment 3 employed angry, happy, and exuberant expressions (Nimstim database) and yielded anger and happiness superiority effects, respectively, highlighting the importance of the choice of stimulus materials. Ratings of the stimulus materials collected in Experiment 3a indicate that differences in perceived emotional intensity, pleasantness, or arousal do not account for differences in search efficiency. Across three studies, the current investigation indicates that prior reports of anger or happiness superiority effects in visual search are likely to reflect on low-level visual features associated with the stimulus materials used, rather than on emotion. PsycINFO Database Record (c) 2013 APA, all rights reserved.
A user-friendly phytoremediation database: creating the searchable database, the users, and the broader implications.

PubMed

Famulari, Stevie; Witz, Kyla

2015-01-01

Designers, students, teachers, gardeners, farmers, landscape architects, architects, engineers, homeowners, and others have uses for the practice of phytoremediation. This research looks at the creation of a phytoremediation database which is designed for ease of use for a non-scientific user, as well as for students in an educational setting ( http://www.steviefamulari.net/phytoremediation ). During 2012, Environmental Artist & Professor of Landscape Architecture Stevie Famulari, with assistance from Kyla Witz, a landscape architecture student, created an online searchable database designed for high public accessibility. The database is a record of research of plant species that aid in the uptake of contaminants, including metals, organic materials, biodiesels & oils, and radionuclides. The database consists of multiple interconnected indexes categorized into common and scientific plant name, contaminant name, and contaminant type. It includes photographs, hardiness zones, specific plant qualities, full citations to the original research, and other relevant information intended to aid those designing with phytoremediation search for potential plants which may be used to address their site's need. The objective of the terminology section is to remove uncertainty for more inexperienced users, and to clarify terms for a more user-friendly experience. Implications of the work, including education and ease of browsing, as well as use of the database in teaching, are discussed.
Searching for 'Unknown Unknowns'

NASA Technical Reports Server (NTRS)

Parsons, Vickie S.

2005-01-01

The NASA Engineering and Safety Center (NESC) was established to improve safety through engineering excellence within NASA programs and projects. As part of this goal, methods are being investigated to enable the NESC to become proactive in identifying areas that may be precursors to future problems. The goal is to find unknown indicators of future problems, not to duplicate the program-specific trending efforts. The data that is critical for detecting these indicators exist in a plethora of dissimilar non-conformance and other databases (without a common format or taxonomy). In fact, much of the data is unstructured text. However, one common database is not required if the right standards and electronic tools are employed. Electronic data mining is a particularly promising tool for this effort into unsupervised learning of common factors. This work in progress began with a systematic evaluation of available data mining software packages, based on documented decision techniques using weighted criteria. The four packages, which were perceived to have the most promise for NASA applications, are being benchmarked and evaluated by independent contractors. Preliminary recommendations for "best practices" in data mining and trending are provided. Final results and recommendations should be available in the Fall 2005. This critical first step in identifying "unknown unknowns" before they become problems is applicable to any set of engineering or programmatic data.
Genetic Testing Registry

MedlinePlus

... Splign Vector Alignment Search Tool (VAST) All Data & Software Resources... Domains & Structures BioSystems Cn3D Conserved Domain Database (CDD) Conserved Domain Search Service (CD Search) Structure (Molecular Modeling Database) Vector Alignment ...
Foraging patterns in online searches.

PubMed

Wang, Xiangwen; Pleimling, Michel

2017-03-01

Nowadays online searches are undeniably the most common form of information gathering, as witnessed by billions of clicks generated each day on search engines. In this work we describe online searches as foraging processes that take place on the semi-infinite line. Using a variety of quantities like probability distributions and complementary cumulative distribution functions of step length and waiting time as well as mean square displacements and entropies, we analyze three different click-through logs that contain the detailed information of millions of queries submitted to search engines. Notable differences between the different logs reveal an increased efficiency of the search engines. In the language of foraging, the newer logs indicate that online searches overwhelmingly yield local searches (i.e., on one page of links provided by the search engines), whereas for the older logs the foraging processes are a combination of local searches and relocation phases that are power law distributed. Our investigation of click logs of search engines therefore highlights the presence of intermittent search processes (where phases of local explorations are separated by power law distributed relocation jumps) in online searches. It follows that good search engines enable the users to find the information they are looking for through a local exploration of a single page with search results, whereas for poor search engine users are often forced to do a broader exploration of different pages.
Foraging patterns in online searches

NASA Astrophysics Data System (ADS)

Wang, Xiangwen; Pleimling, Michel

2017-03-01

Nowadays online searches are undeniably the most common form of information gathering, as witnessed by billions of clicks generated each day on search engines. In this work we describe online searches as foraging processes that take place on the semi-infinite line. Using a variety of quantities like probability distributions and complementary cumulative distribution functions of step length and waiting time as well as mean square displacements and entropies, we analyze three different click-through logs that contain the detailed information of millions of queries submitted to search engines. Notable differences between the different logs reveal an increased efficiency of the search engines. In the language of foraging, the newer logs indicate that online searches overwhelmingly yield local searches (i.e., on one page of links provided by the search engines), whereas for the older logs the foraging processes are a combination of local searches and relocation phases that are power law distributed. Our investigation of click logs of search engines therefore highlights the presence of intermittent search processes (where phases of local explorations are separated by power law distributed relocation jumps) in online searches. It follows that good search engines enable the users to find the information they are looking for through a local exploration of a single page with search results, whereas for poor search engine users are often forced to do a broader exploration of different pages.
Database Search Strategies & Tips. Reprints from the Best of "ONLINE" [and]"DATABASE."

ERIC Educational Resources Information Center

Online, Inc., Weston, CT.

Reprints of 17 articles presenting strategies and tips for searching databases online appear in this collection, which is one in a series of volumes of reprints from "ONLINE" and "DATABASE" magazines. Edited for information professionals who use electronically distributed databases, these articles address such topics as: (1)…
BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines.

PubMed

Jácome, Alberto G; Fdez-Riverola, Florentino; Lourenço, Anália

2016-07-01

Text mining and semantic analysis approaches can be applied to the construction of biomedical domain-specific search engines and provide an attractive alternative to create personalized and enhanced search experiences. Therefore, this work introduces the new open-source BIOMedical Search Engine Framework for the fast and lightweight development of domain-specific search engines. The rationale behind this framework is to incorporate core features typically available in search engine frameworks with flexible and extensible technologies to retrieve biomedical documents, annotate meaningful domain concepts, and develop highly customized Web search interfaces. The BIOMedical Search Engine Framework integrates taggers for major biomedical concepts, such as diseases, drugs, genes, proteins, compounds and organisms, and enables the use of domain-specific controlled vocabulary. Technologies from the Typesafe Reactive Platform, the AngularJS JavaScript framework and the Bootstrap HTML/CSS framework support the customization of the domain-oriented search application. Moreover, the RESTful API of the BIOMedical Search Engine Framework allows the integration of the search engine into existing systems or a complete web interface personalization. The construction of the Smart Drug Search is described as proof-of-concept of the BIOMedical Search Engine Framework. This public search engine catalogs scientific literature about antimicrobial resistance, microbial virulence and topics alike. The keyword-based queries of the users are transformed into concepts and search results are presented and ranked accordingly. The semantic graph view portraits all the concepts found in the results, and the researcher may look into the relevance of different concepts, the strength of direct relations, and non-trivial, indirect relations. The number of occurrences of the concept shows its importance to the query, and the frequency of concept co-occurrence is indicative of biological relations meaningful to that particular scope of research. Conversely, indirect concept associations, i.e. concepts related by other intermediary concepts, can be useful to integrate information from different studies and look into non-trivial relations. The BIOMedical Search Engine Framework supports the development of domain-specific search engines. The key strengths of the framework are modularity and extensibilityin terms of software design, the use of open-source consolidated Web technologies, and the ability to integrate any number of biomedical text mining tools and information resources. Currently, the Smart Drug Search keeps over 1,186,000 documents, containing more than 11,854,000 annotations for 77,200 different concepts. The Smart Drug Search is publicly accessible at http://sing.ei.uvigo.es/sds/. The BIOMedical Search Engine Framework is freely available for non-commercial use at https://github.com/agjacome/biomsef. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A concept ideation framework for medical device design.

PubMed

Hagedorn, Thomas J; Grosse, Ian R; Krishnamurty, Sundar

2015-06-01

Medical device design is a challenging process, often requiring collaboration between medical and engineering domain experts. This collaboration can be best institutionalized through systematic knowledge transfer between the two domains coupled with effective knowledge management throughout the design innovation process. Toward this goal, we present the development of a semantic framework for medical device design that unifies a large medical ontology with detailed engineering functional models along with the repository of design innovation information contained in the US Patent Database. As part of our development, existing medical, engineering, and patent document ontologies were modified and interlinked to create a comprehensive medical device innovation and design tool with appropriate properties and semantic relations to facilitate knowledge capture, enrich existing knowledge, and enable effective knowledge reuse for different scenarios. The result is a Concept Ideation Framework for Medical Device Design (CIFMeDD). Key features of the resulting framework include function-based searching and automated inter-domain reasoning to uniquely enable identification of functionally similar procedures, tools, and inventions from multiple domains based on simple semantic searches. The significance and usefulness of the resulting framework for aiding in conceptual design and innovation in the medical realm are explored via two case studies examining medical device design problems. Copyright © 2015 Elsevier Inc. All rights reserved.
Assigning statistical significance to proteotypic peptides via database searches

PubMed Central

Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

2011-01-01

Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId’s knowledge database to include proteotypic information, utilized RAId’s statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId’s programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489

A fuzzy-match search engine for physician directories.

PubMed

Rastegar-Mojarad, Majid; Kadolph, Christopher; Ye, Zhan; Wall, Daniel; Murali, Narayana; Lin, Simon

2014-11-04

A search engine to find physicians' information is a basic but crucial function of a health care provider's website. Inefficient search engines, which return no results or incorrect results, can lead to patient frustration and potential customer loss. A search engine that can handle misspellings and spelling variations of names is needed, as the United States (US) has culturally, racially, and ethnically diverse names. The Marshfield Clinic website provides a search engine for users to search for physicians' names. The current search engine provides an auto-completion function, but it requires an exact match. We observed that 26% of all searches yielded no results. The goal was to design a fuzzy-match algorithm to aid users in finding physicians easier and faster. Instead of an exact match search, we used a fuzzy algorithm to find similar matches for searched terms. In the algorithm, we solved three types of search engine failures: "Typographic", "Phonetic spelling variation", and "Nickname". To solve these mismatches, we used a customized Levenshtein distance calculation that incorporated Soundex coding and a lookup table of nicknames derived from US census data. Using the "Challenge Data Set of Marshfield Physician Names," we evaluated the accuracy of fuzzy-match engine-top ten (90%) and compared it with exact match (0%), Soundex (24%), Levenshtein distance (59%), and fuzzy-match engine-top one (71%). We designed, created a reference implementation, and evaluated a fuzzy-match search engine for physician directories. The open-source code is available at the codeplex website and a reference implementation is available for demonstration at the datamarsh website.
National Center for Biotechnology Information

MedlinePlus

... Splign Vector Alignment Search Tool (VAST) All Data & Software Resources... Domains & Structures BioSystems Cn3D Conserved Domain Database (CDD) Conserved Domain Search Service (CD Search) Structure (Molecular Modeling Database) Vector Alignment ...
Information sources for obesity prevention policy research: a review of systematic reviews.

PubMed

Hanneke, Rosie; Young, Sabrina K

2017-08-08

Systematic identification of evidence in health policy can be time-consuming and challenging. This study examines three questions pertaining to systematic reviews on obesity prevention policy, in order to identify the most efficient search methods: (1) What percentage of the primary studies selected for inclusion in the reviews originated in scholarly as opposed to gray literature? (2) How much of the primary scholarly literature in this topic area is indexed in PubMed/MEDLINE? (3) Which databases index the greatest number of primary studies not indexed in PubMed, and are these databases searched consistently across systematic reviews? We identified systematic reviews on obesity prevention policy and explored their search methods and citations. We determined the percentage of scholarly vs. gray literature cited, the most frequently cited journals, and whether each primary study was indexed in PubMed. We searched 21 databases for all primary study articles not indexed in PubMed to determine which database(s) indexed the highest number of these relevant articles. In total, 21 systematic reviews were identified. Ten of the 21 systematic reviews reported searching gray literature, and 12 reviews ultimately included gray literature in their analyses. Scholarly articles accounted for 577 of the 649 total primary study papers. Of these, 495 (76%) were indexed in PubMed. Google Scholar retrieved the highest number of the remaining 82 non-PubMed scholarly articles, followed by Scopus and EconLit. The Journal of the American Dietetic Association was the most-cited journal. Researchers can maximize search efficiency by searching a small yet targeted selection of both scholarly and gray literature resources. A highly sensitive search of PubMed and those databases that index the greatest number of relevant articles not indexed in PubMed, namely multidisciplinary and economics databases, could save considerable time and effort. When combined with a gray literature search and additional search methods, including cited reference searching and consulting with experts, this approach could help maintain broad retrieval of relevant studies while improving search efficiency. Findings also have implications for designing specialized databases for public health research.
Locality in Search Engine Queries and Its Implications for Caching

DTIC Science & Technology

2001-05-01

in the question of whether caching might be effective for search engines as well. They study two real search engine traces by examining query...locality and its implications for caching. The two search engines studied are Vivisimo and Excite. Their trace analysis results show that queries have
Transterm—extended search facilities and improved integration with other databases

PubMed Central

Jacobs, Grant H.; Stockwell, Peter A.; Tate, Warren P.; Brown, Chris M.

2006-01-01

Transterm has now been publicly available for >10 years. Major changes have been made since its last description in this database issue in 2002. The current database provides data for key regions of mRNA sequences, a curated database of mRNA motifs and tools to allow users to investigate their own motifs or mRNA sequences. The key mRNA regions database is derived computationally from Genbank. It contains 3′ and 5′ flanking regions, the initiation and termination signal context and coding sequence for annotated CDS features from Genbank and RefSeq. The database is non-redundant, enabling summary files and statistics to be prepared for each species. Advances include providing extended search facilities, the database may now be searched by BLAST in addition to regular expressions (patterns) allowing users to search for motifs such as known miRNA sequences, and the inclusion of RefSeq data. The database contains >40 motifs or structural patterns important for translational control. In this release, patterns from UTRsite and Rfam are also incorporated with cross-referencing. Users may search their sequence data with Transterm or user-defined patterns. The system is accessible at . PMID:16381889
Variability of patient spine education by Internet search engine.

PubMed

Ghobrial, George M; Mehdi, Angud; Maltenfort, Mitchell; Sharan, Ashwini D; Harrop, James S

2014-03-01

Patients are increasingly reliant upon the Internet as a primary source of medical information. The educational experience varies by search engine, search term, and changes daily. There are no tools for critical evaluation of spinal surgery websites. To highlight the variability between common search engines for the same search terms. To detect bias, by prevalence of specific kinds of websites for certain spinal disorders. Demonstrate a simple scoring system of spinal disorder website for patient use, to maximize the quality of information exposed to the patient. Ten common search terms were used to query three of the most common search engines. The top fifty results of each query were tabulated. A negative binomial regression was performed to highlight the variation across each search engine. Google was more likely than Bing and Yahoo search engines to return hospital ads (P=0.002) and more likely to return scholarly sites of peer-reviewed lite (P=0.003). Educational web sites, surgical group sites, and online web communities had a significantly higher likelihood of returning on any search, regardless of search engine, or search string (P=0.007). Likewise, professional websites, including hospital run, industry sponsored, legal, and peer-reviewed web pages were less likely to be found on a search overall, regardless of engine and search string (P=0.078). The Internet is a rapidly growing body of medical information which can serve as a useful tool for patient education. High quality information is readily available, provided that the patient uses a consistent, focused metric for evaluating online spine surgery information, as there is a clear variability in the way search engines present information to the patient. Published by Elsevier B.V.
A rank-based Prediction Algorithm of Learning User's Intention

NASA Astrophysics Data System (ADS)

Shen, Jie; Gao, Ying; Chen, Cang; Gong, HaiPing

Internet search has become an important part in people's daily life. People can find many types of information to meet different needs through search engines on the Internet. There are two issues for the current search engines: first, the users should predetermine the types of information they want and then change to the appropriate types of search engine interfaces. Second, most search engines can support multiple kinds of search functions, each function has its own separate search interface. While users need different types of information, they must switch between different interfaces. In practice, most queries are corresponding to various types of information results. These queries can search the relevant results in various search engines, such as query "Palace" contains the websites about the introduction of the National Palace Museum, blog, Wikipedia, some pictures and video information. This paper presents a new aggregative algorithm for all kinds of search results. It can filter and sort the search results by learning three aspects about the query words, search results and search history logs to achieve the purpose of detecting user's intention. Experiments demonstrate that this rank-based method for multi-types of search results is effective. It can meet the user's search needs well, enhance user's satisfaction, provide an effective and rational model for optimizing search engines and improve user's search experience.
Measurement of tag confidence in user generated contents retrieval

NASA Astrophysics Data System (ADS)

Lee, Sihyoung; Min, Hyun-Seok; Lee, Young Bok; Ro, Yong Man

2009-01-01

As online image sharing services are becoming popular, the importance of correctly annotated tags is being emphasized for precise search and retrieval. Tags created by user along with user-generated contents (UGC) are often ambiguous due to the fact that some tags are highly subjective and visually unrelated to the image. They cause unwanted results to users when image search engines rely on tags. In this paper, we propose a method of measuring tag confidence so that one can differentiate confidence tags from noisy tags. The proposed tag confidence is measured from visual semantics of the image. To verify the usefulness of the proposed method, experiments were performed with UGC database from social network sites. Experimental results showed that the image retrieval performance with confidence tags was increased.
Finding research information on the web: how to make the most of Google and other free search tools.

PubMed

Blakeman, Karen

2013-01-01

The Internet and the World Wide Web has had a major impact on the accessibility of research information. The move towards open access and development of institutional repositories has resulted in increasing amounts of information being made available free of charge. Many of these resources are not included in conventional subscription databases and Google is not always the best way to ensure that one is picking up all relevant material on a topic. This article will look at how Google's search engine works, how to use Google more effectively for identifying research information, alternatives to Google and will review some of the specialist tools that have evolved to cope with the diverse forms of information that now exist in electronic form.
Genebanks: a comparison of eight proposed international genetic databases.

PubMed

Austin, Melissa A; Harding, Sarah; McElroy, Courtney

2003-01-01

To identify and compare population-based genetic databases, or "genebanks", that have been proposed in eight international locations between 1998 and 2002. A genebank can be defined as a stored collection of genetic samples in the form of blood or tissue, that can be linked with medical and genealogical or lifestyle information from a specific population, gathered using a process of generalized consent. Genebanks were identified by searching Medline and internet search engines with key words such as "genetic database" and "biobank" and by reviewing literature on previously identified databases such as the deCode project. Collection of genebank characteristics was by an electronic and literature search, augmented by correspondence with informed individuals. The proposed genebanks are located in Iceland, the United Kingdom, Estonia, Latvia, Sweden, Singapore, the Kingdom of Tonga, and Quebec, Canada. Comparisons of the genebanks were based on the following criteria: genebank location and description of purpose, role of government, commercial involvement, consent and confidentiality procedures, opposition to the genebank, and current progress. All of the groups proposing the genebanks plan to search for susceptibility genes for complex diseases while attempting to improve public health and medical care in the region and, in some cases, stimulating the local economy through expansion of the biotechnology sector. While all of the identified plans share these purposes, they differ in many aspects, including funding, subject participation, and organization. The balance of government and commercial involvement in the development of each project varies. Genetic samples and health information will be collected from participants and coded in all of the genebanks, but consent procedures range from presumed consent of the entire eligible population to recruitment of volunteers with informed consent. Issues regarding confidentiality and consent have resulted in opposition to some of the more publicized projects. None of the proposed databases are currently operational and at least one project was terminated due to opposition. Ambitious genebank projects have been proposed in numerous countries and provinces. The characteristics of the projects vary, but all intend to map genes for common diseases and hope to improve the health of the populations involved. The impact of these projects on understanding genetic susceptibility to disease will be increasingly apparent if the projects become operational. The ethical, legal, and social implications of the projects should be carefully considered during their development. Copyright 2003 S. Karger AG, Basel
National Rehabilitation Information Center

MedlinePlus

... search the NARIC website or one of our databases Select a database or search for a webpage A NARIC webpage ... Projects conducting research and/or development (NIDILRR Program Database). Organizations, agencies, and online resources that support people ...
Quantum search of a real unstructured database

NASA Astrophysics Data System (ADS)

Broda, Bogusław

2016-02-01

A simple circuit implementation of the oracle for Grover's quantum search of a real unstructured classical database is proposed. The oracle contains a kind of quantumly accessible classical memory, which stores the database.
Overcoming Species Boundaries in Peptide Identification with Bayesian Information Criterion-driven Error-tolerant Peptide Search (BICEPS)*

PubMed Central

Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno

2012-01-01

Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179
LigandBox: A database for 3D structures of chemical compounds

PubMed Central

Kawabata, Takeshi; Sugihara, Yusuke; Fukunishi, Yoshifumi; Nakamura, Haruki

2013-01-01

A database for the 3D structures of available compounds is essential for the virtual screening by molecular docking. We have developed the LigandBox database (http://ligandbox.protein.osaka-u.ac.jp/ligandbox/) containing four million available compounds, collected from the catalogues of 37 commercial suppliers, and approved drugs and biochemical compounds taken from KEGG_DRUG, KEGG_COMPOUND and PDB databases. Each chemical compound in the database has several 3D conformers with hydrogen atoms and atomic charges, which are ready to be docked into receptors using docking programs. The 3D conformations were generated using our molecular simulation program package, myPresto. Various physical properties, such as aqueous solubility (LogS) and carcinogenicity have also been calculated to characterize the ADME-Tox properties of the compounds. The Web database provides two services for compound searches: a property/chemical ID search and a chemical structure search. The chemical structure search is performed by a descriptor search and a maximum common substructure (MCS) search combination, using our program kcombu. By specifying a query chemical structure, users can find similar compounds among the millions of compounds in the database within a few minutes. Our database is expected to assist a wide range of researchers, in the fields of medical science, chemical biology, and biochemistry, who are seeking to discover active chemical compounds by the virtual screening. PMID:27493549
LigandBox: A database for 3D structures of chemical compounds.

PubMed

Kawabata, Takeshi; Sugihara, Yusuke; Fukunishi, Yoshifumi; Nakamura, Haruki

2013-01-01

A database for the 3D structures of available compounds is essential for the virtual screening by molecular docking. We have developed the LigandBox database (http://ligandbox.protein.osaka-u.ac.jp/ligandbox/) containing four million available compounds, collected from the catalogues of 37 commercial suppliers, and approved drugs and biochemical compounds taken from KEGG_DRUG, KEGG_COMPOUND and PDB databases. Each chemical compound in the database has several 3D conformers with hydrogen atoms and atomic charges, which are ready to be docked into receptors using docking programs. The 3D conformations were generated using our molecular simulation program package, myPresto. Various physical properties, such as aqueous solubility (LogS) and carcinogenicity have also been calculated to characterize the ADME-Tox properties of the compounds. The Web database provides two services for compound searches: a property/chemical ID search and a chemical structure search. The chemical structure search is performed by a descriptor search and a maximum common substructure (MCS) search combination, using our program kcombu. By specifying a query chemical structure, users can find similar compounds among the millions of compounds in the database within a few minutes. Our database is expected to assist a wide range of researchers, in the fields of medical science, chemical biology, and biochemistry, who are seeking to discover active chemical compounds by the virtual screening.
MEGALEX: A megastudy of visual and auditory word recognition.

PubMed

Ferrand, Ludovic; Méot, Alain; Spinelli, Elsa; New, Boris; Pallier, Christophe; Bonin, Patrick; Dufau, Stéphane; Mathôt, Sebastiaan; Grainger, Jonathan

2018-06-01

Using the megastudy approach, we report a new database (MEGALEX) of visual and auditory lexical decision times and accuracy rates for tens of thousands of words. We collected visual lexical decision data for 28,466 French words and the same number of pseudowords, and auditory lexical decision data for 17,876 French words and the same number of pseudowords (synthesized tokens were used for the auditory modality). This constitutes the first large-scale database for auditory lexical decision, and the first database to enable a direct comparison of word recognition in different modalities. Different regression analyses were conducted to illustrate potential ways to exploit this megastudy database. First, we compared the proportions of variance accounted for by five word frequency measures. Second, we conducted item-level regression analyses to examine the relative importance of the lexical variables influencing performance in the different modalities (visual and auditory). Finally, we compared the similarities and differences between the two modalities. All data are freely available on our website ( https://sedufau.shinyapps.io/megalex/ ) and are searchable at www.lexique.org , inside the Open Lexique search engine.
BGD: a database of bat genomes.

PubMed

Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

2015-01-01

Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.
Modelling and Simulation of Search Engine

NASA Astrophysics Data System (ADS)

Nasution, Mahyuddin K. M.

2017-01-01

The best tool currently used to access information is a search engine. Meanwhile, the information space has its own behaviour. Systematically, an information space needs to be familiarized with mathematics so easily we identify the characteristics associated with it. This paper reveal some characteristics of search engine based on a model of document collection, which are then estimated the impact on the feasibility of information. We reveal some of characteristics of search engine on the lemma and theorem about singleton and doubleton, then computes statistically characteristic as simulating the possibility of using search engine. In this case, Google and Yahoo. There are differences in the behaviour of both search engines, although in theory based on the concept of documents collection.
Searching for evidence or approval? A commentary on database search in systematic reviews and alternative information retrieval methodologies.

PubMed

Delaney, Aogán; Tamás, Peter A

2018-03-01

Despite recognition that database search alone is inadequate even within the health sciences, it appears that reviewers in fields that have adopted systematic review are choosing to rely primarily, or only, on database search for information retrieval. This commentary reminds readers of factors that call into question the appropriateness of default reliance on database searches particularly as systematic review is adapted for use in new and lower consensus fields. It then discusses alternative methods for information retrieval that require development, formalisation, and evaluation. Our goals are to encourage reviewers to reflect critically and transparently on their choice of information retrieval methods and to encourage investment in research on alternatives. Copyright © 2017 John Wiley & Sons, Ltd.
Federated or cached searches: Providing expected performance from multiple invasive species databases

NASA Astrophysics Data System (ADS)

Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

2011-06-01

Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search "deep" web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

Federated or cached searches: providing expected performance from multiple invasive species databases

USGS Publications Warehouse

Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

2011-01-01

Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search “deep” web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.
New generation of the multimedia search engines

NASA Astrophysics Data System (ADS)

Mijes Cruz, Mario Humberto; Soto Aldaco, Andrea; Maldonado Cano, Luis Alejandro; López Rodríguez, Mario; Rodríguez Vázqueza, Manuel Antonio; Amaya Reyes, Laura Mariel; Cano Martínez, Elizabeth; Pérez Rosas, Osvaldo Gerardo; Rodríguez Espejo, Luis; Flores Secundino, Jesús Abimelek; Rivera Martínez, José Luis; García Vázquez, Mireya Saraí; Zamudio Fuentes, Luis Miguel; Sánchez Valenzuela, Juan Carlos; Montoya Obeso, Abraham; Ramírez Acosta, Alejandro Álvaro

2016-09-01

Current search engines are based upon search methods that involve the combination of words (text-based search); which has been efficient until now. However, the Internet's growing demand indicates that there's more diversity on it with each passing day. Text-based searches are becoming limited, as most of the information on the Internet can be found in different types of content denominated multimedia content (images, audio files, video files). Indeed, what needs to be improved in current search engines is: search content, and precision; as well as an accurate display of expected search results by the user. Any search can be more precise if it uses more text parameters, but it doesn't help improve the content or speed of the search itself. One solution is to improve them through the characterization of the content for the search in multimedia files. In this article, an analysis of the new generation multimedia search engines is presented, focusing the needs according to new technologies. Multimedia content has become a central part of the flow of information in our daily life. This reflects the necessity of having multimedia search engines, as well as knowing the real tasks that it must comply. Through this analysis, it is shown that there are not many search engines that can perform content searches. The area of research of multimedia search engines of new generation is a multidisciplinary area that's in constant growth, generating tools that satisfy the different needs of new generation systems.
Design and implementation of a database for Brucella melitensis genome annotation.

PubMed

De Hertogh, Benoît; Lahlimi, Leïla; Lambert, Christophe; Letesson, Jean-Jacques; Depiereux, Eric

2008-03-18

The genome sequences of three Brucella biovars and of some species close to Brucella sp. have become available, leading to new relationship analysis. Moreover, the automatic genome annotation of the pathogenic bacteria Brucella melitensis has been manually corrected by a consortium of experts, leading to 899 modifications of start sites predictions among the 3198 open reading frames (ORFs) examined. This new annotation, coupled with the results of automatic annotation tools of the complete genome sequences of the B. melitensis genome (including BLASTs to 9 genomes close to Brucella), provides numerous data sets related to predicted functions, biochemical properties and phylogenic comparisons. To made these results available, alphaPAGe, a functional auto-updatable database of the corrected sequence genome of B. melitensis, has been built, using the entity-relationship (ER) approach and a multi-purpose database structure. A friendly graphical user interface has been designed, and users can carry out different kinds of information by three levels of queries: (1) the basic search use the classical keywords or sequence identifiers; (2) the original advanced search engine allows to combine (by using logical operators) numerous criteria: (a) keywords (textual comparison) related to the pCDS's function, family domains and cellular localization; (b) physico-chemical characteristics (numerical comparison) such as isoelectric point or molecular weight and structural criteria such as the nucleic length or the number of transmembrane helix (TMH); (c) similarity scores with Escherichia coli and 10 species phylogenetically close to B. melitensis; (3) complex queries can be performed by using a SQL field, which allows all queries respecting the database's structure. The database is publicly available through a Web server at the following url: http://www.fundp.ac.be/urbm/bioinfo/aPAGe.
The Effectiveness of Web Search Engines to Index New Sites from Different Countries

ERIC Educational Resources Information Center

Pirkola, Ari

2009-01-01

Introduction: Investigates how effectively Web search engines index new sites from different countries. The primary interest is whether new sites are indexed equally or whether search engines are biased towards certain countries. If major search engines show biased coverage it can be considered a significant economic and political problem because…
Taming the Information Jungle with WWW Search Engines.

ERIC Educational Resources Information Center

Repman, Judi; And Others

1997-01-01

Because searching the Web with different engines often produces different results, the best strategy is to learn how each engine works. Discusses comparing search engines; qualities to consider (ease of use, relevance of hits, and speed); and six of the most popular search tools (Yahoo, Magellan. InfoSeek, Alta Vista, Lycos, and Excite). Lists…
Internet Databases of the Properties, Enzymatic Reactions, and Metabolism of Small Molecules—Search Options and Applications in Food Science

PubMed Central

Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Bucholska, Justyna; Starowicz, Piotr; Czyrko, Emilia

2016-01-01

Internet databases of small molecules, their enzymatic reactions, and metabolism have emerged as useful tools in food science. Database searching is also introduced as part of chemistry or enzymology courses for food technology students. Such resources support the search for information about single compounds and facilitate the introduction of secondary analyses of large datasets. Information can be retrieved from databases by searching for the compound name or structure, annotating with the help of chemical codes or drawn using molecule editing software. Data mining options may be enhanced by navigating through a network of links and cross-links between databases. Exemplary databases reviewed in this article belong to two classes: tools concerning small molecules (including general and specialized databases annotating food components) and tools annotating enzymes and metabolism. Some problems associated with database application are also discussed. Data summarized in computer databases may be used for calculation of daily intake of bioactive compounds, prediction of metabolism of food components, and their biological activity as well as for prediction of interactions between food component and drugs. PMID:27929431
Internet Databases of the Properties, Enzymatic Reactions, and Metabolism of Small Molecules-Search Options and Applications in Food Science.

PubMed

Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Bucholska, Justyna; Starowicz, Piotr; Czyrko, Emilia

2016-12-06

Internet databases of small molecules, their enzymatic reactions, and metabolism have emerged as useful tools in food science. Database searching is also introduced as part of chemistry or enzymology courses for food technology students. Such resources support the search for information about single compounds and facilitate the introduction of secondary analyses of large datasets. Information can be retrieved from databases by searching for the compound name or structure, annotating with the help of chemical codes or drawn using molecule editing software. Data mining options may be enhanced by navigating through a network of links and cross-links between databases. Exemplary databases reviewed in this article belong to two classes: tools concerning small molecules (including general and specialized databases annotating food components) and tools annotating enzymes and metabolism. Some problems associated with database application are also discussed. Data summarized in computer databases may be used for calculation of daily intake of bioactive compounds, prediction of metabolism of food components, and their biological activity as well as for prediction of interactions between food component and drugs.
MIRASS: medical informatics research activity support system using information mashup network.

PubMed

Kiah, M L M; Zaidan, B B; Zaidan, A A; Nabi, Mohamed; Ibraheem, Rabiu

2014-04-01

The advancement of information technology has facilitated the automation and feasibility of online information sharing. The second generation of the World Wide Web (Web 2.0) enables the collaboration and sharing of online information through Web-serving applications. Data mashup, which is considered a Web 2.0 platform, plays an important role in information and communication technology applications. However, few ideas have been transformed into education and research domains, particularly in medical informatics. The creation of a friendly environment for medical informatics research requires the removal of certain obstacles in terms of search time, resource credibility, and search result accuracy. This paper considers three glitches that researchers encounter in medical informatics research; these glitches include the quality of papers obtained from scientific search engines (particularly, Web of Science and Science Direct), the quality of articles from the indices of these search engines, and the customizability and flexibility of these search engines. A customizable search engine for trusted resources of medical informatics was developed and implemented through data mashup. Results show that the proposed search engine improves the usability of scientific search engines for medical informatics. Pipe search engine was found to be more efficient than other engines.
Chemical Information in Scirus and BASE (Bielefeld Academic Search Engine)

ERIC Educational Resources Information Center

Bendig, Regina B.

2009-01-01

The author sought to determine to what extent the two search engines, Scirus and BASE (Bielefeld Academic Search Engines), would be useful to first-year university students as the first point of searching for chemical information. Five topics were searched and the first ten records of each search result were evaluated with regard to the type of…
Searching for Controlled Trials of Complementary and Alternative Medicine: A Comparison of 15 Databases

PubMed Central

Cogo, Elise; Sampson, Margaret; Ajiferuke, Isola; Manheimer, Eric; Campbell, Kaitryn; Daniel, Raymond; Moher, David

2011-01-01

This project aims to assess the utility of bibliographic databases beyond the three major ones (MEDLINE, EMBASE and Cochrane CENTRAL) for finding controlled trials of complementary and alternative medicine (CAM). Fifteen databases were searched to identify controlled clinical trials (CCTs) of CAM not also indexed in MEDLINE. Searches were conducted in May 2006 using the revised Cochrane highly sensitive search strategy (HSSS) and the PubMed CAM Subset. Yield of CAM trials per 100 records was determined, and databases were compared over a standardized period (2005). The Acudoc2 RCT, Acubriefs, Index to Chiropractic Literature (ICL) and Hom-Inform databases had the highest concentrations of non-MEDLINE records, with more than 100 non-MEDLINE records per 500. Other productive databases had ratios between 500 and 1500 records to 100 non-MEDLINE records—these were AMED, MANTIS, PsycINFO, CINAHL, Global Health and Alt HealthWatch. Five databases were found to be unproductive: AGRICOLA, CAIRSS, Datadiwan, Herb Research Foundation and IBIDS. Acudoc2 RCT yielded 100 CAM trials in the most recent 100 records screened. Acubriefs, AMED, Hom-Inform, MANTIS, PsycINFO and CINAHL had more than 25 CAM trials per 100 records screened. Global Health, ICL and Alt HealthWatch were below 25 in yield. There were 255 non-MEDLINE trials from eight databases in 2005, with only 10% indexed in more than one database. Yield varied greatly between databases; the most productive databases from both sampling methods were Acubriefs, Acudoc2 RCT, AMED and CINAHL. Low overlap between databases indicates comprehensive CAM literature searches will require multiple databases. PMID:19468052
Searching for controlled trials of complementary and alternative medicine: a comparison of 15 databases.

PubMed

Cogo, Elise; Sampson, Margaret; Ajiferuke, Isola; Manheimer, Eric; Campbell, Kaitryn; Daniel, Raymond; Moher, David

2011-01-01

This project aims to assess the utility of bibliographic databases beyond the three major ones (MEDLINE, EMBASE and Cochrane CENTRAL) for finding controlled trials of complementary and alternative medicine (CAM). Fifteen databases were searched to identify controlled clinical trials (CCTs) of CAM not also indexed in MEDLINE. Searches were conducted in May 2006 using the revised Cochrane highly sensitive search strategy (HSSS) and the PubMed CAM Subset. Yield of CAM trials per 100 records was determined, and databases were compared over a standardized period (2005). The Acudoc2 RCT, Acubriefs, Index to Chiropractic Literature (ICL) and Hom-Inform databases had the highest concentrations of non-MEDLINE records, with more than 100 non-MEDLINE records per 500. Other productive databases had ratios between 500 and 1500 records to 100 non-MEDLINE records-these were AMED, MANTIS, PsycINFO, CINAHL, Global Health and Alt HealthWatch. Five databases were found to be unproductive: AGRICOLA, CAIRSS, Datadiwan, Herb Research Foundation and IBIDS. Acudoc2 RCT yielded 100 CAM trials in the most recent 100 records screened. Acubriefs, AMED, Hom-Inform, MANTIS, PsycINFO and CINAHL had more than 25 CAM trials per 100 records screened. Global Health, ICL and Alt HealthWatch were below 25 in yield. There were 255 non-MEDLINE trials from eight databases in 2005, with only 10% indexed in more than one database. Yield varied greatly between databases; the most productive databases from both sampling methods were Acubriefs, Acudoc2 RCT, AMED and CINAHL. Low overlap between databases indicates comprehensive CAM literature searches will require multiple databases.
End User Information Searching on the Internet: How Do Users Search and What Do They Search For? (SIG USE)

ERIC Educational Resources Information Center

Saracevic, Tefko

2000-01-01

Summarizes a presentation that discussed findings and implications of research projects using an Internet search service and Internet-accessible vendor databases, representing the two sides of public database searching: query formulation and resource utilization. Presenters included: Tefko Saracevic, Amanda Spink, Dietmar Wolfram and Hong Xie.…
Assessment of Metabolome Annotation Quality: A Method for Evaluating the False Discovery Rate of Elemental Composition Searches

PubMed Central

Matsuda, Fumio; Shinbo, Yoko; Oikawa, Akira; Hirai, Masami Yokota; Fiehn, Oliver; Kanaya, Shigehiko; Saito, Kazuki

2009-01-01

Background In metabolomics researches using mass spectrometry (MS), systematic searching of high-resolution mass data against compound databases is often the first step of metabolite annotation to determine elemental compositions possessing similar theoretical mass numbers. However, incorrect hits derived from errors in mass analyses will be included in the results of elemental composition searches. To assess the quality of peak annotation information, a novel methodology for false discovery rates (FDR) evaluation is presented in this study. Based on the FDR analyses, several aspects of an elemental composition search, including setting a threshold, estimating FDR, and the types of elemental composition databases most reliable for searching are discussed. Methodology/Principal Findings The FDR can be determined from one measured value (i.e., the hit rate for search queries) and four parameters determined by Monte Carlo simulation. The results indicate that relatively high FDR values (30–50%) were obtained when searching time-of-flight (TOF)/MS data using the KNApSAcK and KEGG databases. In addition, searches against large all-in-one databases (e.g., PubChem) always produced unacceptable results (FDR >70%). The estimated FDRs suggest that the quality of search results can be improved not only by performing more accurate mass analysis but also by modifying the properties of the compound database. A theoretical analysis indicates that FDR could be improved by using compound database with smaller but higher completeness entries. Conclusions/Significance High accuracy mass analysis, such as Fourier transform (FT)-MS, is needed for reliable annotation (FDR <10%). In addition, a small, customized compound database is preferable for high-quality annotation of metabolome data. PMID:19847304
Comparing Web search engine performance in searching consumer health information: evaluation and recommendations.

PubMed Central

Wu, G; Li, J

1999-01-01

Identifying and accessing reliable, relevant consumer health information rapidly on the Internet may challenge the health sciences librarian and layperson alike. In this study, seven search engines are compared using representative consumer health topics for their content relevancy, system features, and attributes. The paper discusses evaluation criteria; systematically compares relevant results; analyzes performance in terms of the strengths and weaknesses of the search engines; and illustrates effective search engine selection, search formulation, and strategies. PMID:10550031
NREL: Renewable Resource Data Center - Biomass Resource Publications

Science.gov Websites

Marginal Lands in APEC Economies NREL Publications Database For a comprehensive list of other NREL biomass resource publications, explore NREL's Publications Database. When searching the database, search on "
Islamic Extremists Love the Internet

DTIC Science & Technology

2009-04-03

down on the West. Terrorists’ Use of Search Engines In order to find a particular blog, extremists use search engines such as Bloglines...BlogScope, and Technorati to search blog contents. Technorati, which is among the most popular blog search engines , provides current information on...of mid- January 2009 is tracking over 31.78 million blogs with 579.86 million posts.49 Other ways the terrorists use Web search engines are to
Combinatorial Fusion Analysis for Meta Search Information Retrieval

NASA Astrophysics Data System (ADS)

Hsu, D. Frank; Taksa, Isak

Leading commercial search engines are built as single event systems. In response to a particular search query, the search engine returns a single list of ranked search results. To find more relevant results the user must frequently try several other search engines. A meta search engine was developed to enhance the process of multi-engine querying. The meta search engine queries several engines at the same time and fuses individual engine results into a single search results list. The fusion of multiple search results has been shown (mostly experimentally) to be highly effective. However, the question of why and how the fusion should be done still remains largely unanswered. In this chapter, we utilize the combinatorial fusion analysis proposed by Hsu et al. to analyze combination and fusion of multiple sources of information. A rank/score function is used in the design and analysis of our framework. The framework provides a better understanding of the fusion phenomenon in information retrieval. For example, to improve the performance of the combined multiple scoring systems, it is necessary that each of the individual scoring systems has relatively high performance and the individual scoring systems are diverse. Additionally, we illustrate various applications of the framework using two examples from the information retrieval domain.
An assessment of the efficacy of searching in biomedical databases beyond MEDLINE in identifying studies for a systematic review on ward closures as an infection control intervention to control outbreaks.

PubMed

Kwon, Yoojin; Powelson, Susan E; Wong, Holly; Ghali, William A; Conly, John M

2014-11-11

The purpose of our study is to determine the value and efficacy of searching biomedical databases beyond MEDLINE for systematic reviews. We analyzed the results from a systematic review conducted by the authors and others on ward closure as an infection control practice. Ovid MEDLINE including In-Process & Other Non-Indexed Citations, Ovid Embase, CINAHL Plus, LILACS, and IndMED were systematically searched for articles of any study type discussing ward closure, as were bibliographies of selected articles and recent infection control conference abstracts. Search results were tracked, recorded, and analyzed using a relative recall method. The sensitivity of searching in each database was calculated. Two thousand ninety-five unique citations were identified and screened for inclusion in the systematic review: 2,060 from database searching and 35 from hand searching and other sources. Ninety-seven citations were included in the final review. MEDLINE and Embase searches each retrieved 80 of the 97 articles included, only 4 articles from each database were unique. The CINAHL search retrieved 35 included articles, and 4 were unique. The IndMED and LILACS searches did not retrieve any included articles, although 75 of the included articles were indexed in LILACS. The true value of using regional databases, particularly LILACS, may lie with the ability to search in the language spoken in the region. Eight articles were found only through hand searching. Identifying studies for a systematic review where the research is observational is complex. The value each individual study contributes to the review cannot be accurately measured. Consequently, we could not determine the value of results found from searching beyond MEDLINE, Embase, and CINAHL with accuracy. However, hand searching for serendipitous retrieval remains an important aspect due to indexing and keyword challenges inherent in this literature.
ThermoData Engine Database

National Institute of Standards and Technology Data Gateway

SRD 103a NIST ThermoData Engine Database (PC database for purchase) ThermoData Engine is the first product fully implementing all major principles of the concept of dynamic data evaluation formulated at NIST/TRC.
Search Engines on the World Wide Web.

ERIC Educational Resources Information Center

Walster, Dian

1997-01-01

Discusses search engines and provides methods for determining what resources are searched, the quality of the information, and the algorithms used that will improve the use of search engines on the World Wide Web, online public access catalogs, and electronic encyclopedias. Lists strategies for conducting searches and for learning about the latest…

The Mercury System: Embedding Computation into Disk Drives

DTIC Science & Technology

2004-08-20

enabling technologies to build extremely fast data search engines . We do this by moving the search closer to the data, and performing it in hardware...engine searches in parallel across a disk or disk surface 2. System Parallelism: Searching is off-loaded to search engines and main processor can
Should we search Chinese biomedical databases when performing systematic reviews?

PubMed

Cohen, Jérémie F; Korevaar, Daniël A; Wang, Junfeng; Spijker, René; Bossuyt, Patrick M

2015-03-06

Chinese biomedical databases contain a large number of publications available to systematic reviewers, but it is unclear whether they are used for synthesizing the available evidence. We report a case of two systematic reviews on the accuracy of anti-cyclic citrullinated peptide for diagnosing rheumatoid arthritis. In one of these, the authors did not search Chinese databases; in the other, they did. We additionally assessed the extent to which Cochrane reviewers have searched Chinese databases in a systematic overview of the Cochrane Library (inception to 2014). The two diagnostic reviews included a total of 269 unique studies, but only 4 studies were included in both reviews. The first review included five studies published in the Chinese language (out of 151) while the second included 114 (out of 118). The summary accuracy estimates from the two reviews were comparable. Only 243 of the published 8,680 Cochrane reviews (less than 3%) searched one or more of the five major Chinese databases. These Chinese databases index about 2,500 journals, of which less than 6% are also indexed in MEDLINE. All 243 Cochrane reviews evaluated an intervention, 179 (74%) had at least one author with a Chinese affiliation; 118 (49%) addressed a topic in complementary or alternative medicine. Although searching Chinese databases may lead to the identification of a large amount of additional clinical evidence, Cochrane reviewers have rarely included them in their search strategy. We encourage future initiatives to evaluate more systematically the relevance of searching Chinese databases, as well as collaborative efforts to allow better incorporation of Chinese resources in systematic reviews.
NBIC: Search Ballast Report Database

Science.gov Websites

Smithsonian Environmental Research Center Logo US Coast Guard Logo Submit BW Report | Search NBIC Database developed an online database that can be queried through our website. Data are accessible for all coastal Lakes, have been incorporated into the NBIC database as of August 2004. Information on data availability
Using the TIGR gene index databases for biological discovery.

PubMed

Lee, Yuandan; Quackenbush, John

2003-11-01

The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
A bibliography of IRIS-related publications, 2000-2011

NASA Astrophysics Data System (ADS)

Muco, B.

2012-12-01

Citations and acknowledgements in scientific journals can be an indicator of the role an organization has on the research of that field. Since its formation and incorporation in May 1984, the IRIS Consortium (Incorporated Research Institutions for Seismology) is mentioned more and more as a valuable source of data, instruments and programs in the literature of earth sciences. As a large organization with more than 100 member domestic institutes and about 40 international affiliates, obviously IRIS has a direct impact on the earth sciences through all its programs, projects, workshops, symposia, and news¬letters and as a lively forum for exchanging ideas. In order to maintain support from National Science Foundation (NSF) and the research community, it is important to document the continued use of IRIS facilities in basic research programs. IRIS maintains a database of articles that are based on the use of IRIS facilities or which reference use of IRIS data and resources. Articles in this database have been either been provided to IRIS by the authors or selected through an annual search of a number of prominent journals. A text version of the full bibliographic database is available on the IRIS website and a version in EndNote format is also provided. To provide a more complete bibliography and a consistent evaluation of temporal tends in publications, a special annual search began in 2000 which focused on a subset of key seismology and Earth science journals: Bulletin of Seismological Society of America, Journal of Geophysical Research, Seismological Research Letters, Geophysical Research Letters, Earth and Planetary Science Letters, Physics of the Earth and Planetary Interiors, Tectonophysics, Geophysical Journal International, Nature, Science, Geology and EOS. Using different search engines as Scirus, ScienceDirect, GeoRef, OCLC First Search, EASI Search, NASA Abstract Service etc. for online journals and publishers' databases, we searched for key words (IRIS, GSN, DMS, PASSCAL, USArray etc) in titles, abstracts and text. Most of the selections found by this method were confirmed by reading through online texts or original journals. This bibliography of peer-reviewed articles (excluding abstracts) identified in these key journals for 2000-2011 includes approximately 1800 entries. As for American Geophysical Union (AGU) transaction, the bibliography of IRIS-related abstracts for the abovementioned period includes approximately 1400 abstracts. This study is a clear indicator of making intensive use by the seismological community of the resources that IRIS provides and of the paramount importance this organization has in advancement of seismological research worldwide.
[Biomedical information on the internet using search engines. A one-year trial].

PubMed

Corrao, Salvatore; Leone, Francesco; Arnone, Sabrina

2004-01-01

The internet is a communication medium and content distributor that provide information in the general sense but it could be of great utility regarding as the search and retrieval of biomedical information. Search engines represent a great deal to rapidly find information on the net. However, we do not know whether general search engines and meta-search ones are reliable in order to find useful and validated biomedical information. The aim of our study was to verify the reproducibility of a search by key-words (pediatric or evidence) using 9 international search engines and 1 meta-search engine at the baseline and after a one year period. We analysed the first 20 citations as output of each searching. We evaluated the formal quality of Web-sites and their domain extensions. Moreover, we compared the output of each search at the start of this study and after a one year period and we considered as a criterion of reliability the number of Web-sites cited again. We found some interesting results that are reported throughout the text. Our findings point out an extreme dynamicity of the information on the Web and, for this reason, we advice a great caution when someone want to use search and meta-search engines as a tool for searching and retrieve reliable biomedical information. On the other hand, some search and meta-search engines could be very useful as a first step searching for defining better a search and, moreover, for finding institutional Web-sites too. This paper allows to know a more conscious approach to the internet biomedical information universe.
An Online Resource for Flight Test Safety Planning

NASA Technical Reports Server (NTRS)

Lewis, Greg

2007-01-01

A viewgraph presentation describing an online database for flight test safety techniques is shown. The topics include: 1) Goal; 2) Test Hazard Analyses; 3) Online Database Background; 4) Data Gathering; 5) NTPS Role; 6) Organizations; 7) Hazard Titles; 8) FAR Paragraphs; 9) Maneuver Name; 10) Identified Hazard; 11) Matured Hazard Titles; 12) Loss of Control Causes; 13) Mitigations; 14) Database Now Open to the Public; 15) FAR Reference Search; 16) Record Field Search; 17) Keyword Search; and 18) Results of FAR Reference Search.
Searching Databases without Query-Building Aids: Implications for Dyslexic Users

ERIC Educational Resources Information Center

Berget, Gerd; Sandnes, Frode Eika

2015-01-01

Introduction: Few studies document the information searching behaviour of users with cognitive impairments. This paper therefore addresses the effect of dyslexia on information searching in a database with no tolerance for spelling errors and no query-building aids. The purpose was to identify effective search interface design guidelines that…
Conducting a Web Search.

ERIC Educational Resources Information Center

Miller-Whitehead, Marie

Keyword and text string searches of online library catalogs often provide different results according to library and database used and depending upon how books and journals are indexed. For this reason, online databases such as ERIC often provide tutorials and recommendations for searching their site, such as how to use Boolean search strategies.…
The effects of tobacco smoking on the incidence and risk of intraoperative and postoperative complications in adults.

PubMed

Gourgiotis, Stavros; Aloizos, Stavros; Aravosita, Paraskevi; Mystakelli, Christina; Isaia, Eleni-Christina; Gakis, Christos; Salemis, Nikolaos S

2011-08-01

Despite the warnings of health hazards of cigarette smoking, still one third of the population in industrial countries smoke. This review was conducted with the aim of exploring the effects of preoperative tobacco smoking on the risk of intra- and postoperative complications and to identify the value of preoperative smoking cessation. The databases that were searched included The Cochrane Library Database, Medline, and EMBASE. Articles were also identified through a general internet search using the Google search engine. The incidence or risk of different types of intra- and postoperative complications were used as outcome measures. Tobacco smoking has a negative effect on surgical outcome, as has been found to be a risk factor for the development of complications during and after many types of surgery, even in the absence of chronic lung disease. Furthermore, the long-term health hazards of smoking reduce health-related quality of life and premature death. It is widely documented that stopping smoking before surgery has substantial health benefits in the longer term and should be recommended to every smoker in order for them to gain maximum benefit from their treatment. However, identification of the optimal period of preoperative smoking cessation on postoperative complications cannot be determined. Copyright © 2011 Royal College of Surgeons of Edinburgh (Scottish charity number SC005317) and Royal College of Surgeons in Ireland. Published by Elsevier Ltd. All rights reserved.
Citation searching: a systematic review case study of multiple risk behaviour interventions

PubMed Central

2014-01-01

Background The value of citation searches as part of the systematic review process is currently unknown. While the major guides to conducting systematic reviews state that citation searching should be carried out in addition to searching bibliographic databases there are still few studies in the literature that support this view. Rather than using a predefined search strategy to retrieve studies, citation searching uses known relevant papers to identify further papers. Methods We describe a case study about the effectiveness of using the citation sources Google Scholar, Scopus, Web of Science and OVIDSP MEDLINE to identify records for inclusion in a systematic review. We used the 40 included studies identified by traditional database searches from one systematic review of interventions for multiple risk behaviours. We searched for each of the included studies in the four citation sources to retrieve the details of all papers that have cited these studies. We carried out two analyses; the first was to examine the overlap between the four citation sources to identify which citation tool was the most useful; the second was to investigate whether the citation searches identified any relevant records in addition to those retrieved by the original database searches. Results The highest number of citations was retrieved from Google Scholar (1680), followed by Scopus (1173), then Web of Science (1095) and lastly OVIDSP (213). To retrieve all the records identified by the citation tracking searching all four resources was required. Google Scholar identified the highest number of unique citations. The citation tracking identified 9 studies that met the review’s inclusion criteria. Eight of these had already been identified by the traditional databases searches and identified in the screening process while the ninth was not available in any of the databases when the original searches were carried out. It would, however, have been identified by two of the database search strategies if searches had been carried out later. Conclusions Based on the results from this investigation, citation searching as a supplementary search method for systematic reviews may not be the best use of valuable time and resources. It would be useful to verify these findings in other reviews. PMID:24893958
Alternative Fuels Data Center: Vehicle Search

Science.gov Websites

ZeroTruck Search Engines and Hybrid Systems For medium- and heavy-duty vehicles: Engine & Power Sources Hydraulic hybrid Hybrid - CNG Hybrid - Diesel Electric Hybrid - LNG Hybrid Search x Pick Engine Fuel Natural Gas Propane Electric Plug-in Hybrid Electric Hydraulic hybrid Hybrid Search x Pick Engine Fuel
Getting to the top of Google: search engine optimization.

PubMed

Maley, Catherine; Baum, Neil

2010-01-01

Search engine optimization is the process of making your Web site appear at or near the top of popular search engines such as Google, Yahoo, and MSN. This is not done by luck or knowing someone working for the search engines but by understanding the process of how search engines select Web sites for placement on top or on the first page. This article will review the process and provide methods and techniques to use to have your site rated at the top or very near the top.
BEAUTY-X: enhanced BLAST searches for DNA queries.

PubMed

Worley, K C; Culpepper, P; Wiese, B A; Smith, R F

1998-01-01

BEAUTY (BLAST Enhanced Alignment Utility) is an enhanced version of the BLAST database search tool that facilitates identification of the functions of matched sequences. Three recent improvements to the BEAUTY program described here make the enhanced output (1) available for DNA queries, (2) available for searches of any protein database, and (3) more up-to-date, with periodic updates of the domain information. BEAUTY searches of the NCBI and EMBL non-redundant protein sequence databases are available from the BCM Search Launcher Web pages (http://gc.bcm.tmc. edu:8088/search-launcher/launcher.html). BEAUTY Post-Processing of submitted search results is available using the BCM Search Launcher Batch Client (version 2.6) (ftp://gc.bcm.tmc. edu/pub/software/search-launcher/). Example figures are available at http://dot.bcm.tmc. edu:9331/papers/beautypp.html (kworley,culpep)@bcm.tmc.edu
Integrating GIS, Archeology, and the Internet.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sera White; Brenda Ringe Pace; Randy Lee

2004-08-01

At the Idaho National Engineering and Environmental Laboratory's (INEEL) Cultural Resource Management Office, a newly developed Data Management Tool (DMT) is improving management and long-term stewardship of cultural resources. The fully integrated system links an archaeological database, a historical database, and a research database to spatial data through a customized user interface using ArcIMS and Active Server Pages. Components of the new DMT are tailored specifically to the INEEL and include automated data entry forms for historic and prehistoric archaeological sites, specialized queries and reports that address both yearly and project-specific documentation requirements, and unique field recording forms. The predictivemore » modeling component increases the DMT’s value for land use planning and long-term stewardship. The DMT enhances the efficiency of archive searches, improving customer service, oversight, and management of the large INEEL cultural resource inventory. In the future, the DMT will facilitate data sharing with regulatory agencies, tribal organizations, and the general public.« less
dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock.

PubMed

Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta

2016-01-01

Microalgae have attracted wide attention as one of the most versatile renewable feedstocks for production of biofuel. To develop genetically engineered high lipid yielding algal strains, a thorough understanding of the lipid biosynthetic pathway and the underpinning enzymes is essential. In this work, we have systematically mined the genomes of fifteen diverse algal species belonging to Chlorophyta, Heterokontophyta, Rhodophyta, and Haptophyta, to identify and annotate the putative enzymes of lipid metabolic pathway. Consequently, we have also developed a database, dEMBF (Database of Enzymes of Microalgal Biofuel Feedstock), which catalogues the complete list of identified enzymes along with their computed annotation details including length, hydrophobicity, amino acid composition, subcellular location, gene ontology, KEGG pathway, orthologous group, Pfam domain, intron-exon organization, transmembrane topology, and secondary/tertiary structural data. Furthermore, to facilitate functional and evolutionary study of these enzymes, a collection of built-in applications for BLAST search, motif identification, sequence and phylogenetic analysis have been seamlessly integrated into the database. dEMBF is the first database that brings together all enzymes responsible for lipid synthesis from available algal genomes, and provides an integrative platform for enzyme inquiry and analysis. This database will be extremely useful for algal biofuel research. It can be accessed at http://bbprof.immt.res.in/embf.
dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock

PubMed Central

Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta

2016-01-01

Microalgae have attracted wide attention as one of the most versatile renewable feedstocks for production of biofuel. To develop genetically engineered high lipid yielding algal strains, a thorough understanding of the lipid biosynthetic pathway and the underpinning enzymes is essential. In this work, we have systematically mined the genomes of fifteen diverse algal species belonging to Chlorophyta, Heterokontophyta, Rhodophyta, and Haptophyta, to identify and annotate the putative enzymes of lipid metabolic pathway. Consequently, we have also developed a database, dEMBF (Database of Enzymes of Microalgal Biofuel Feedstock), which catalogues the complete list of identified enzymes along with their computed annotation details including length, hydrophobicity, amino acid composition, subcellular location, gene ontology, KEGG pathway, orthologous group, Pfam domain, intron-exon organization, transmembrane topology, and secondary/tertiary structural data. Furthermore, to facilitate functional and evolutionary study of these enzymes, a collection of built-in applications for BLAST search, motif identification, sequence and phylogenetic analysis have been seamlessly integrated into the database. dEMBF is the first database that brings together all enzymes responsible for lipid synthesis from available algal genomes, and provides an integrative platform for enzyme inquiry and analysis. This database will be extremely useful for algal biofuel research. It can be accessed at http://bbprof.immt.res.in/embf. PMID:26727469
Combining results of multiple search engines in proteomics.

PubMed

Shteynberg, David; Nesvizhskii, Alexey I; Moritz, Robert L; Deutsch, Eric W

2013-09-01

A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques.
Combining Results of Multiple Search Engines in Proteomics*

PubMed Central

Shteynberg, David; Nesvizhskii, Alexey I.; Moritz, Robert L.; Deutsch, Eric W.

2013-01-01

A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques. PMID:23720762
A comparative study of six European databases of medically oriented Web resources.

PubMed

Abad García, Francisca; González Teruel, Aurora; Bayo Calduch, Patricia; de Ramón Frias, Rosa; Castillo Blasco, Lourdes

2005-10-01

The paper describes six European medically oriented databases of Web resources, pertaining to five quality-controlled subject gateways, and compares their performance. The characteristics, coverage, procedure for selecting Web resources, record structure, searching possibilities, and existence of user assistance were described for each database. Performance indicators for each database were obtained by means of searches carried out using the key words, "myocardial infarction." Most of the databases originated in the 1990s in an academic or library context and include all types of Web resources of an international nature. Five databases use Medical Subject Headings. The number of fields per record varies between three and nineteen. The language of the search interfaces is mostly English, and some of them allow searches in other languages. In some databases, the search can be extended to Pubmed. Organizing Medical Networked Information, Catalogue et Index des Sites Médicaux Francophones, and Diseases, Disorders and Related Topics produced the best results. The usefulness of these databases as quick reference resources is clear. In addition, their lack of content overlap means that, for the user, they complement each other. Their continued survival faces three challenges: the instability of the Internet, maintenance costs, and lack of use in spite of their potential usefulness.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.