Science.gov

Sample records for probabilistic database search

  1. MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines

    PubMed Central

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I.; Marcotte, Edward M.

    2011-01-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses. PMID:21488652

  2. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.

    PubMed

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I; Marcotte, Edward M

    2011-07-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.

  3. Optimal probabilistic search

    SciTech Connect

    Lokutsievskiy, Lev V

    2011-05-31

    This paper is concerned with the optimal search of an object at rest with unknown exact position in the n-dimensional space. A necessary condition for optimality of a trajectory is obtained. An explicit form of a differential equation for an optimal trajectory is found while searching over R-strongly convex sets. An existence theorem is also established. Bibliography: 8 titles.

  4. Online Database Searching Workbook.

    ERIC Educational Resources Information Center

    Littlejohn, Alice C.; Parker, Joan M.

    Designed primarily for use by first-time searchers, this workbook provides an overview of online searching. Following a brief introduction which defines online searching, databases, and database producers, five steps in carrying out a successful search are described: (1) identifying the main concepts of the search statement; (2) selecting a…

  5. Online Database Searching Workbook.

    ERIC Educational Resources Information Center

    Littlejohn, Alice C.; Parker, Joan M.

    Designed primarily for use by first-time searchers, this workbook provides an overview of online searching. Following a brief introduction which defines online searching, databases, and database producers, five steps in carrying out a successful search are described: (1) identifying the main concepts of the search statement; (2) selecting a…

  6. Database Searching by Managers.

    ERIC Educational Resources Information Center

    Arnold, Stephen E.

    Managers and executives need the easy and quick access to business and management information that online databases can provide, but many have difficulty articulating their search needs to an intermediary. One possible solution would be to encourage managers and their immediate support staff members to search textual databases directly as they now…

  7. Paleomagnetic database search possible

    NASA Astrophysics Data System (ADS)

    Harbert, William

    I have recently finished an on-line search program which allows remote users to search the “Abase” ASCII version of the World Paleomagnetic Database developed by Lock and McElhinny [1991]. The program is very simple to use and will search the Soviet, non-Soviet, rock unit, and reference databases and create output files that can be downloaded back to a researcher's local system using the ftp command.To use Search, telnet to 130.49.3.1 (earth.eps.pitt.edu) and login as the user “Search.rdquo There is no password, and the user is asked a series of questions, which define the geographic region and ages of interest. The program will also ask for an identifier with which to create the output file names. The program has three modes of operation: text-only, Tektronix graphics, or X11l/R5 graphics; the proper choice depends on the computer hardware that is used by the searcher.

  8. Alternative Databases for Anthropology Searching.

    ERIC Educational Resources Information Center

    Brody, Fern; Lambert, Maureen

    1984-01-01

    Examines online search results of sample questions in several databases covering linguistics, cultural anthropology, and physical anthropology in order to determine if and where any overlap in results might occur, and which files have greatest number of relevant hits. Search results by database are given for each subject area. (EJS)

  9. Begin: Online Database Searching Now!

    ERIC Educational Resources Information Center

    Lodish, Erica K.

    1986-01-01

    Because of the increasing importance of online databases, school library media specialists are encouraged to introduce students to online searching. Four books that would help media specialists gain a basic background are reviewed and it is noted that although they are very technical, they can be adapted to individual needs. (EM)

  10. Searching NCBI Databases Using Entrez.

    PubMed

    Gibney, Gretchen; Baxevanis, Andreas D

    2011-10-01

    One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two basic protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An alternate protocol builds upon the first basic protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The support protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed. © 2011 by John Wiley & Sons, Inc.

  11. Searching NCBI databases using Entrez.

    PubMed

    Baxevanis, Andreas D

    2008-12-01

    One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed. Copyright 2008 by John Wiley & Sons, Inc.

  12. Searching NCBI databases using Entrez.

    PubMed

    Gibney, Gretchen; Baxevanis, Andreas D

    2011-06-01

    One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two basic protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An alternate protocol builds upon the first basic protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The support protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  13. Database Search Engines: Paradigms, Challenges and Solutions.

    PubMed

    Verheggen, Kenneth; Martens, Lennart; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    The first step in identifying proteins from mass spectrometry based shotgun proteomics data is to infer peptides from tandem mass spectra, a task generally achieved using database search engines. In this chapter, the basic principles of database search engines are introduced with a focus on open source software, and the use of database search engines is demonstrated using the freely available SearchGUI interface. This chapter also discusses how to tackle general issues related to sequence database searching and shows how to minimize their impact.

  14. Library Instruction and Online Database Searching.

    ERIC Educational Resources Information Center

    Mercado, Heidi

    1999-01-01

    Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…

  15. Library Instruction and Online Database Searching.

    ERIC Educational Resources Information Center

    Mercado, Heidi

    1999-01-01

    Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…

  16. Quantum search of a real unstructured database

    NASA Astrophysics Data System (ADS)

    Broda, Bogusław

    2016-02-01

    A simple circuit implementation of the oracle for Grover's quantum search of a real unstructured classical database is proposed. The oracle contains a kind of quantumly accessible classical memory, which stores the database.

  17. Online Database Searching in Smaller Public Libraries.

    ERIC Educational Resources Information Center

    Roose, Tina

    1983-01-01

    Online database searching experiences of nine Illinois public libraries--Arlington Heights, Deerfield, Elk Grove Village, Evanston, Glenview, Northbrook, Schaumburg Township, Waukegan, Wilmette--are discussed, noting search costs, user charges, popular databases, library acquisition, interaction with users, and staff training. Three sources are…

  18. Online Database Searching in Smaller Public Libraries.

    ERIC Educational Resources Information Center

    Roose, Tina

    1983-01-01

    Online database searching experiences of nine Illinois public libraries--Arlington Heights, Deerfield, Elk Grove Village, Evanston, Glenview, Northbrook, Schaumburg Township, Waukegan, Wilmette--are discussed, noting search costs, user charges, popular databases, library acquisition, interaction with users, and staff training. Three sources are…

  19. Probabilistic Computations for Attention, Eye Movements, and Search.

    PubMed

    Eckstein, Miguel P

    2017-09-15

    The term visual attention immediately evokes the idea of limited resources, serial processing, or a zoom metaphor. But evidence has slowly accumulated that computations that take into account probabilistic relationships among visual forms and the target contribute to optimizing decisions in biological and artificial organisms, even without considering these limited-capacity processes in covert attention or even foveation. The benefits from such computations can be formalized within the framework of an ideal Bayesian observer and can be related to the classic theory of sensory cue combination in vision science and context-driven approaches to object detection in computer vision. The framework can account for a large range of behavioral findings across distinct experimental paradigms, including visual search, cueing, and scene context. I argue that these forms of probabilistic computations might be fundamental to optimizing decisions in many species and review human experiments trying to identify scene properties that serve as cues to guide eye movements and facilitate search. I conclude by discussing contributions of attention beyond probabilistic computations but argue that the framework's merit is to unify many basic paradigms to study attention under a single theory.

  20. Using volume holograms to search digital databases

    NASA Astrophysics Data System (ADS)

    Burr, Geoffrey W.; Maltezos, George; Grawert, Felix; Kobras, Sebastian; Hanssen, Holger; Coufal, Hans J.

    2002-01-01

    Holographic data storage offers the potential for simultaneous search of an entire database by performing multiple optical correlations between stored data pages and a search argument. This content-addressable retrieval produces one analog correlation score for each stored volume hologram. We have previously developed fuzzy encoding techniques for this fast parallel search, and holographically searched a small database with high fidelity. We recently showed that such systems can be configured to produce true inner-products, and proposed an architecture in which massively-parallel searches could be implemented. However, the speed advantage over conventional electronic search provided by parallelism brings with it the possibility of erroneous search results, since these analog correlation scores are subject to various noise sources. We show that the fidelity of such an optical search depends not only on the usual holographic storage signal-to-noise factors (such as readout power, diffraction efficiency, and readout speed), but also on the particular database query being made. In effect, the presence of non-matching database records with nearly the same correlation score as the targeted matching records reduces the speed advantage of the parallel search. Thus for any given fidelity target, the performance improvement offered by a content-addressable holographic storage can vary from query to query even within the same database.

  1. A Probabilistic Approach to Information Retrieval in Systems with Boolean Search Request Formulations.

    ERIC Educational Resources Information Center

    Radecki, Tadeusz

    1982-01-01

    Outlines an approach to information retrieval which integrates the existing theory of probabilistic retrieval into a practical methodology based on Boolean searches. Basic concepts, search methodology, and examples of Boolean searching are noted. Twenty-six sources are appended. (EJS)

  2. Interactive searching of facial image databases

    NASA Astrophysics Data System (ADS)

    Nicholls, Robert A.; Shepherd, John W.; Shepherd, Jean

    1995-09-01

    A set of psychological facial descriptors has been devised to enable computerized searching of criminal photograph albums. The descriptors have been used to encode image databased of up to twelve thousand images. Using a system called FACES, the databases are searched by translating a witness' verbal description into corresponding facial descriptors. Trials of FACES have shown that this coding scheme is more productive and efficient than searching traditional photograph albums. An alternative method of searching the encoded database using a genetic algorithm is currenly being tested. The genetic search method does not require the witness to verbalize a description of the target but merely to indicate a degree of similarity between the target and a limited selection of images from the database. The major drawback of FACES is that is requires a manual encoding of images. Research is being undertaken to automate the process, however, it will require an algorithm which can predict human descriptive values. Alternatives to human derived coding schemes exist using statistical classifications of images. Since databases encoded using statistical classifiers do not have an obvious direct mapping to human derived descriptors, a search method which does not require the entry of human descriptors is required. A genetic search algorithm is being tested for such a purpose.

  3. Fast Structural Search in Phylogenetic Databases

    PubMed Central

    Wang, Jason T. L.; Shan, Huiyuan; Shasha, Dennis; Piel, William H.

    2007-01-01

    As the size of phylogenetic databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. We propose structural search techniques that, given a query or pattern tree P and a database of phylogenies D, find trees in D that are sufficiently close to P. The “closeness” is a measure of the topological relationships in P that are found to be the same or similar in a tree D in D. We develop a filtering technique that accelerates searches and present algorithms for rooted and unrooted trees where the trees can be weighted or unweighted. Experimental results on comparing the similarity measure with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate that the proposed approach is promising. PMID:19325851

  4. Searching gene and protein sequence databases.

    PubMed

    Barsalou, T; Brutlag, D L

    1991-01-01

    A large-scale effort to map and sequence the human genome is now under way. Crucial to the success of this research is a group of computer programs that analyze and compare data on molecular sequences. This article describes the classic algorithms for similarity searching and sequence alignment. Because good performance of these algorithms is critical to searching very large and growing databases, we analyze the running times of the algorithms and discuss recent improvements in this area.

  5. A fuzzy approach for mining association rules in a probabilistic database

    NASA Astrophysics Data System (ADS)

    Pei, Bin; Chen, Dingjie; Zhao, Suyun; Chen, Hong

    2013-07-01

    Association rule mining is an essential knowledge discovery method that can find associations in database. Previous studies on association rule mining focus on finding quantitative association rules from certain data, or finding Boolean association rules from uncertain data. Unfortunately, due to instrument errors, imprecise of sensor monitoring systems and so on, real-world data tend to be quantitative data with inherent uncertainty. In our paper, we study the discovery of association rules from probabilistic database with quantitative attributes. Once we convert quantitative attributes into fuzzy sets, we get a probabilistic database with fuzzy sets in the database. This is theoretical challenging, since we need to give appropriate interest measures to define support and confidence degree of fuzzy events with probability. We propose a Shannon-like Entropy to measure the information of such event. After that, an algorithm is proposed to find fuzzy association rules from probabilistic database. Finally, an illustrated example is given to demonstrate the procedure of the algorithm.

  6. Efficient search and retrieval in biometric databases

    NASA Astrophysics Data System (ADS)

    Mhatre, Amit J.; Palla, Srinivas; Chikkerur, Sharat; Govindaraju, Venu

    2005-03-01

    Biometric identification has emerged as a reliable means of controlling access to both physical and virtual spaces. Fingerprints, face and voice biometrics are being increasingly used as alternatives to passwords, PINs and visual verification. In spite of the rapid proliferation of large-scale databases, the research has thus far been focused only on accuracy within small databases. In larger applications, response time and retrieval efficiency also become important in addition to accuracy. Unlike structured information such as text or numeric data that can be sorted, biometric data does not have any natural sorting order. Therefore indexing and binning of biometric databases represents a challenging problem. We present results using parallel combination of multiple biometrics to bin the database. Using hand geometry and signature features we show that the search space can be reduced to just 5% of the entire database.

  7. Multi-Database Searching in Forensic Psychology.

    ERIC Educational Resources Information Center

    Piotrowski, Chris; Perdue, Robert W.

    Traditional library skills have been augmented since the introduction of online computerized database services. Because of the complexity of the field, forensic psychology can benefit enormously from the application of comprehensive bibliographic search strategies. The study reported here demonstrated the bibliographic results obtained when a…

  8. Searching Online Database Services over the Internet.

    ERIC Educational Resources Information Center

    Keays, Thomas

    1993-01-01

    Describes how to use the Internet to access commercial online database services, such as DIALOG, and discusses the advantages in terms of costs, reference services, and accessibility. Outlines in detail how to save a search session or link another terminal to a Telnet session, and provides information and Internet addresses for eight vendor…

  9. Multi-Database Searching in Forensic Psychology.

    ERIC Educational Resources Information Center

    Piotrowski, Chris; Perdue, Robert W.

    Traditional library skills have been augmented since the introduction of online computerized database services. Because of the complexity of the field, forensic psychology can benefit enormously from the application of comprehensive bibliographic search strategies. The study reported here demonstrated the bibliographic results obtained when a…

  10. Searching Across the International Space Station Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; McDermott, William J.; Smith, Ernest E.; Bell, David G.; Gurram, Mohana

    2007-01-01

    Data access in the enterprise generally requires us to combine data from different sources and different formats. It is advantageous thus to focus on the intersection of the knowledge across sources and domains; keeping irrelevant knowledge around only serves to make the integration more unwieldy and more complicated than necessary. A context search over multiple domain is proposed in this paper to use context sensitive queries to support disciplined manipulation of domain knowledge resources. The objective of a context search is to provide the capability for interrogating many domain knowledge resources, which are largely semantically disjoint. The search supports formally the tasks of selecting, combining, extending, specializing, and modifying components from a diverse set of domains. This paper demonstrates a new paradigm in composition of information for enterprise applications. In particular, it discusses an approach to achieving data integration across multiple sources, in a manner that does not require heavy investment in database and middleware maintenance. This lean approach to integration leads to cost-effectiveness and scalability of data integration with an underlying schemaless object-relational database management system. This highly scalable, information on demand system framework, called NX-Search, which is an implementation of an information system built on NETMARK. NETMARK is a flexible, high-throughput open database integration framework for managing, storing, and searching unstructured or semi-structured arbitrary XML and HTML used widely at the National Aeronautics Space Administration (NASA) and industry.

  11. Searching the NCBI databases using Entrez.

    PubMed

    Baxevanis, Andreas D

    2006-11-01

    One of the most widely-used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently-issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  12. Searching the NCBI databases using Entrez.

    PubMed

    Baxevanis, Andreas D

    2006-03-01

    One of the most widely-used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently-issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  13. Audio stream classification for multimedia database search

    NASA Astrophysics Data System (ADS)

    Artese, M.; Bianco, S.; Gagliardi, I.; Gasparini, F.

    2013-03-01

    Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.

  14. Identification of polymorphic motifs using probabilistic search algorithms

    PubMed Central

    Basu, Analabha; Chaudhuri, Probal; Majumder, Partha P.

    2005-01-01

    The problem of identifying motifs comprising nucleotides at a set of polymorphic DNA sites, not necessarily contiguous, arises in many human genetic problems. However, when the sites are not contiguous, no efficient algorithm exists for polymorphic motif identification. A search based on complete enumeration is computationally inefficient. We have developed probabilistic search algorithms to discover motifs of known or unknown lengths. We have developed statistical tests of significance for assessing a motif discovery, and a statistical criterion for simultaneously estimating motif length and discovering it. We have tested these algorithms on various synthetic data sets and have shown that they are very efficient, in the sense that the “true” motifs can be detected in the vast majority of replications and in a small number of iterations. Additionally, we have applied them to some real data sets and have shown that they are able to identify known motifs. In certain applications, it is pertinent to find motifs that contain contrasting nucleotides at the sites included in the motif (e.g., motifs identified in case-control association studies). For this, we have suggested appropriate modifications. Using simulations, we have discovered that the success rate of identification of the correct motif is high in case-control studies except when relative risks are small. Our analyses of evolutionary data sets resulted in the identification of some motifs that appear to have important implications on human evolutionary inference. These algorithms can easily be implemented to discover motifs from multilocus genotype data by simple numerical recoding of genotypes. PMID:15632091

  15. A Bayesian network approach to the database search problem in criminal proceedings

    PubMed Central

    2012-01-01

    Background The ‘database search problem’, that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions

  16. A probabilistic NF2 relational algebra for integrated information retrieval and database systems

    SciTech Connect

    Fuhr, N.; Roelleke, T.

    1996-12-31

    The integration of information retrieval (IR) and database systems requires a data model which allows for modelling documents as entities, representing uncertainty and vagueness and performing uncertain inference. For this purpose, we present a probabilistic data model based on relations in non-first-normal-form (NF2). Here, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Thus, the set of weighted index terms of a document are represented as a probabilistic subrelation. In a similar way, imprecise attribute values are modelled as a set-valued attribute. We redefine the relational operators for this type of relations such that the result of each operator is again a probabilistic NF2 relation, where the weight of a tuple gives the probability that this tuple belongs to the result. By ordering the tuples according to decreasing probabilities, the model yields a ranking of answers like in most IR models. This effect also can be used for typical database queries involving imprecise attribute values as well as for combinations of database and IR queries.

  17. Incremental learning of probabilistic rules from clinical databases based on rough set theory.

    PubMed Central

    Tsumoto, S.; Tanaka, H.

    1997-01-01

    Several rule induction methods have been introduced in order to discover meaningful knowledge from databases, including medical domain. However, most of the approaches induce rules from all the data in databases and cannot induce incrementally when new samples are derived. In this paper, a new approach to knowledge acquisition, which induce probabilistic rules incrementally by using rough set technique, is introduced and was evaluated on two clinical databases. The results show that this method induces the same rules as those induced by ordinary non-incremental learning methods, which extract rules from all the datasets, but that the former method requires more computational resources than the latter approach. PMID:9357616

  18. Subject Index to Databases Available from Computer Search Service.

    ERIC Educational Resources Information Center

    Atkinson, Steven D., Comp.; Knee, Michael, Comp.

    The University Libraries Computer Search Service (State University of New York at Albany, SUNYA) provides access to databases from many vendors including BRS, Dialog, Wilsonline, CA Search, and Westlaw. Members of the Computer Search Service, Collection Development, and Reference Service staffs select vendor services and new databases for their…

  19. WAIS Searching of the Current Contents Database

    NASA Astrophysics Data System (ADS)

    Banholzer, P.; Grabenstein, M. E.

    The Homer E. Newell Memorial Library of NASA's Goddard Space Flight Center is developing capabilities to permit Goddard personnel to access electronic resources of the Library via the Internet. The Library's support services contractor, Maxima Corporation, and their subcontractor, SANAD Support Technologies have recently developed a World Wide Web Home Page (http://www-library.gsfc.nasa.gov) to provide the primary means of access. The first searchable database to be made available through the HomePage to Goddard employees is Current Contents, from the Institute for Scientific Information (ISI). The initial implementation includes coverage of articles from the last few months of 1992 to present. These records are augmented with abstracts and references, and often are more robust than equivalent records in bibliographic databases that currently serve the astronomical community. Maxima/SANAD selected Wais Incorporated's WAIS product with which to build the interface to Current Contents. This system allows access from Macintosh, IBM PC, and Unix hosts, which is an important feature for Goddard's multiplatform environment. The forms interface is structured to allow both fielded (author, article title, journal name, id number, keyword, subject term, and citation) and unfielded WAIS searches. The system allows a user to: Retrieve individual journal article records. Retrieve Table of Contents of specific issues of journals. Connect to articles with similar subject terms or keywords. Connect to other issues of the same journal in the same year. Browse journal issues from an alphabetical list of indexed journal names.

  20. Probabilistic Fracture Mechanics analysis based on three-dimensional J-integral database

    NASA Astrophysics Data System (ADS)

    Ye, G.-W.; Yagawa, G.; Yoshimura, S.

    1993-04-01

    The development is described of a novel Probabilistic Fracture Mechanics (PFM) code based on the three-dimensional J-integral database, giving so-called fully plastic solutions. An efficient technique for the evaluation of leak and break probabilities is also utilized, based on the stratified sampling Monte Carlo simulation. The outline of the present PFM code is described, and the J-integral database and the numerical technique are presented. Nonlinear effects of materials on failure probabilities are discussed through the analysis of a surface cracked structure subjected to cyclic tension.

  1. Performance Evaluation of Adaptive Probabilistic Search in P2P Networks

    NASA Astrophysics Data System (ADS)

    Zhang, Haoxiang; Zhang, Lin; Shan, Xiuming; Li, Victor O. K.

    The overall performance of P2P-based file sharing applications is becoming increasingly important. Based on the Adaptive Resource-based Probabilistic Search algorithm (ARPS), which was previously proposed by the authors, a novel probabilistic search algorithm with QoS guarantees is proposed in this letter. The algorithm relies on generating functions to satisfy the user's constraints and to exploit the power-law distribution in the node degree. Simulation results demonstrate that it performs well under various P2P scenarios. The proposed algorithm provides guarantees on the search performance perceived by the user while minimizing the search cost. Furthermore, it allows different QoS levels, resulting in greater flexibility and scalability.

  2. Lost in Search: (Mal-)Adaptation to Probabilistic Decision Environments in Children and Adults

    ERIC Educational Resources Information Center

    Betsch, Tilmann; Lehmann, Anne; Lindow, Stefanie; Lang, Anna; Schoemann, Martin

    2016-01-01

    Adaptive decision making in probabilistic environments requires individuals to use probabilities as weights in predecisional information searches and/or when making subsequent choices. Within a child-friendly computerized environment (Mousekids), we tracked 205 children's (105 children 5-6 years of age and 100 children 9-10 years of age) and 103…

  3. Lost in Search: (Mal-)Adaptation to Probabilistic Decision Environments in Children and Adults

    ERIC Educational Resources Information Center

    Betsch, Tilmann; Lehmann, Anne; Lindow, Stefanie; Lang, Anna; Schoemann, Martin

    2016-01-01

    Adaptive decision making in probabilistic environments requires individuals to use probabilities as weights in predecisional information searches and/or when making subsequent choices. Within a child-friendly computerized environment (Mousekids), we tracked 205 children's (105 children 5-6 years of age and 100 children 9-10 years of age) and 103…

  4. Multiple Database Searching: Techniques and Pitfalls

    ERIC Educational Resources Information Center

    Hawkins, Donald T.

    1978-01-01

    Problems involved in searching multiple data bases are discussed including indexing differences, overlap among data bases, variant spellings, and elimination of duplicate items from search output. Discussion focuses on CA Condensates, Inspec, and Metadex data bases. (J PF)

  5. The Database Dilemma: Online Search Strategies in Nursing.

    ERIC Educational Resources Information Center

    Fried, Ava K.; And Others

    1989-01-01

    Describes a study that compared the coverage of the nursing profession, subject heading specificity, and ease of retrieval of the MEDLINE and Nursing & Allied Health (CINAHL) online databases. The strengths and weaknesses of each database are discussed and hints for searching on both databases are provided. (four references) (CLB)

  6. The Database Dilemma: Online Search Strategies in Nursing.

    ERIC Educational Resources Information Center

    Fried, Ava K.; And Others

    1989-01-01

    Describes a study that compared the coverage of the nursing profession, subject heading specificity, and ease of retrieval of the MEDLINE and Nursing & Allied Health (CINAHL) online databases. The strengths and weaknesses of each database are discussed and hints for searching on both databases are provided. (four references) (CLB)

  7. Searching the ASRS Database Using QUORUM Keyword Search, Phrase Search, Phrase Generation, and Phrase Discovery

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W.; Connors, Mary M. (Technical Monitor)

    2001-01-01

    To support Search Requests and Quick Responses at the Aviation Safety Reporting System (ASRS), four new QUORUM methods have been developed: keyword search, phrase search, phrase generation, and phrase discovery. These methods build upon the core QUORUM methods of text analysis, modeling, and relevance-ranking. QUORUM keyword search retrieves ASRS incident narratives that contain one or more user-specified keywords in typical or selected contexts, and ranks the narratives on their relevance to the keywords in context. QUORUM phrase search retrieves narratives that contain one or more user-specified phrases, and ranks the narratives on their relevance to the phrases. QUORUM phrase generation produces a list of phrases from the ASRS database that contain a user-specified word or phrase. QUORUM phrase discovery finds phrases that are related to topics of interest. Phrase generation and phrase discovery are particularly useful for finding query phrases for input to QUORUM phrase search. The presentation of the new QUORUM methods includes: a brief review of the underlying core QUORUM methods; an overview of the new methods; numerous, concrete examples of ASRS database searches using the new methods; discussion of related methods; and, in the appendices, detailed descriptions of the new methods.

  8. Probabilistic Cuing in Large-Scale Environmental Search

    ERIC Educational Resources Information Center

    Smith, Alastair D.; Hood, Bruce M.; Gilchrist, Iain D.

    2010-01-01

    Finding an object in our environment is an important human ability that also represents a critical component of human foraging behavior. One type of information that aids efficient large-scale search is the likelihood of the object being in one location over another. In this study we investigated the conditions under which individuals respond to…

  9. Probabilistic Cuing in Large-Scale Environmental Search

    ERIC Educational Resources Information Center

    Smith, Alastair D.; Hood, Bruce M.; Gilchrist, Iain D.

    2010-01-01

    Finding an object in our environment is an important human ability that also represents a critical component of human foraging behavior. One type of information that aids efficient large-scale search is the likelihood of the object being in one location over another. In this study we investigated the conditions under which individuals respond to…

  10. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  11. Searching the PASCAL database - A user's perspective

    NASA Technical Reports Server (NTRS)

    Jack, Robert F.

    1989-01-01

    The operation of PASCAL, a bibliographic data base covering broad subject areas in science and technology, is discussed. The data base includes information from about 1973 to the present, including topics in engineering, chemistry, physics, earth science, environmental science, biology, psychology, and medicine. Data from 1986 to the present may be searched using DIALOG. The procedures and classification codes for searching PASCAL are presented. Examples of citations retrieved from the data base are given and suggestions are made concerning when to use PASCAL.

  12. Searching the PASCAL database - A user's perspective

    NASA Technical Reports Server (NTRS)

    Jack, Robert F.

    1989-01-01

    The operation of PASCAL, a bibliographic data base covering broad subject areas in science and technology, is discussed. The data base includes information from about 1973 to the present, including topics in engineering, chemistry, physics, earth science, environmental science, biology, psychology, and medicine. Data from 1986 to the present may be searched using DIALOG. The procedures and classification codes for searching PASCAL are presented. Examples of citations retrieved from the data base are given and suggestions are made concerning when to use PASCAL.

  13. An efficient quantum search engine on unsorted database

    NASA Astrophysics Data System (ADS)

    Lu, Songfeng; Zhang, Yingyu; Liu, Fang

    2013-10-01

    We consider the problem of finding one or more desired items out of an unsorted database. Patel has shown that if the database permits quantum queries, then mere digitization is sufficient for efficient search for one desired item. The algorithm, called factorized quantum search algorithm, presented by him can locate the desired item in an unsorted database using O() queries to factorized oracles. But the algorithm requires that all the attribute values must be distinct from each other. In this paper, we discuss how to make a database satisfy the requirements, and present a quantum search engine based on the algorithm. Our goal is achieved by introducing auxiliary files for the attribute values that are not distinct, and converting every complex query request into a sequence of calls to factorized quantum search algorithm. The query complexity of our algorithm is O() for most cases.

  14. [Online tutorial for searching a dental database].

    PubMed

    Liem, S L

    2009-05-01

    With millions of resources available on the Internet, it is still difficult to search for appropriate and relevant information, even with the use of advanced search engines. With no systematic quality control of online resources, it is difficult to determine how reliable information is. The consortium Intute, which administers a databank of high quality information available via the Internet, which is intended to support scientific teaching and research, ensures that all information provided has been evaluated and investigated by its own team of specialists in various disciplines. A part of the website of Intute which is accessible free of charge is the Virtual Training Suite, by means of which one can improve one's competence in Internet searching and where a number of reliable and qualitatively superior sources for daily practice are available.

  15. Is Library Database Searching a Language Learning Activity?

    ERIC Educational Resources Information Center

    Bordonaro, Karen

    2010-01-01

    This study explores how non-native speakers of English think of words to enter into library databases when they begin the process of searching for information in English. At issue is whether or not language learning takes place when these students use library databases. Language learning in this study refers to the use of strategies employed by…

  16. Chemical Substructure Searching: Comparing Three Commercially Available Databases.

    ERIC Educational Resources Information Center

    Wagner, A. Ben

    1986-01-01

    Compares the differences in coverage and utility of three substructure databases--Chemical Abstracts, Index Chemicus, and Chemical Information System's Nomenclature Search System. The differences between Chemical Abstracts with two different vendors--STN International and Questel--are described and a summary guide for choosing between databases is…

  17. Chemical Substructure Searching: Comparing Three Commercially Available Databases.

    ERIC Educational Resources Information Center

    Wagner, A. Ben

    1986-01-01

    Compares the differences in coverage and utility of three substructure databases--Chemical Abstracts, Index Chemicus, and Chemical Information System's Nomenclature Search System. The differences between Chemical Abstracts with two different vendors--STN International and Questel--are described and a summary guide for choosing between databases is…

  18. Is Library Database Searching a Language Learning Activity?

    ERIC Educational Resources Information Center

    Bordonaro, Karen

    2010-01-01

    This study explores how non-native speakers of English think of words to enter into library databases when they begin the process of searching for information in English. At issue is whether or not language learning takes place when these students use library databases. Language learning in this study refers to the use of strategies employed by…

  19. A comprehensive and scalable database search system for metaproteomics.

    PubMed

    Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W

    2016-08-16

    Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for

  20. Automatic sub-volume registration by probabilistic random search

    NASA Astrophysics Data System (ADS)

    Han, Jingfeng; Qiao, Min; Hornegger, Joachim; Kuwert, Torsten; Bautz, Werner; Römer, Wolfgang

    2006-03-01

    Registration of an individual's image data set to an anatomical atlas provides valuable information to the physician. In many cases, the individual image data sets are partial data, which may be mapped to one part or one organ of the entire atlas data. Most of the existing intensity based image registration approaches are designed to align images of the entire view. When they are applied to the registration with partial data, a manual pre-registration is usually required. This paper proposes a fully automatic approach to the registration of the incomplete image data to an anatomical atlas. The spatial transformations are modelled as any parametric functions. The proposed method is built upon a random search mechanism, which allows to find the optimal transformation randomly and globally even when the initialization is not ideal. It works more reliably than the existing methods for the partial data registration because it successfully overcomes the local optimum problem. With appropriate similarity measures, this framework is applicable to both mono-modal and multi-modal registration problems with partial data. The contribution of this work is the description of the mathematical framework of the proposed algorithm and the implementation of the related software. The medical evaluation on the MRI data and the comparison of the proposed method with different existing registration methods show the feasibility and superiority of the proposed method.

  1. Exhaustive Database Searching for Amino Acid Mutations in Proteomes

    SciTech Connect

    Hyatt, Philip Douglas; Pan, Chongle

    2012-01-01

    Amino acid mutations in proteins can be found by searching tandem mass spectra acquired in shotgun proteomics experiments against protein sequences predicted from genomes. Traditionally, unconstrained searches for amino acid mutations have been accomplished by using a sequence tagging approach that combines de novo sequencing with database searching. However, this approach is limited by the performance of de novo sequencing. The Sipros algorithm v2.0 was developed to perform unconstrained database searching using high-resolution tandem mass spectra by exhaustively enumerating all single non-isobaric mutations for every residue in a protein database. The performance of Sipros for amino acid mutation identification exceeded that of an established sequence tagging algorithm, Inspect, based on benchmarking results from a Rhodopseudomonas palustris proteomics dataset. To demonstrate the viability of the algorithm for meta-proteomics, Sipros was used to identify amino acid mutations in a natural microbial community in acid mine drainage.

  2. BioCarian: search engine for exploratory searches in heterogeneous biological databases.

    PubMed

    Zaki, Nazar; Tennakoon, Chandana

    2017-10-02

    There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search

  3. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics

    PubMed Central

    Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.

    2016-01-01

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the

  4. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

    PubMed

    Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L

    2016-11-04

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the

  5. Searching Harvard Business Review Online. . . Lessons in Searching a Full Text Database.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1985-01-01

    This article examines the Harvard Business Review Online (HBRO) database (bibliographic description fields, abstracts, extracted information, full text, subject descriptors) and reports on 31 sample HBRO searches conducted in Bibliographic Retrieval Services to test differences between searching full text and searching bibliographic record. Sample…

  6. Searching Harvard Business Review Online. . . Lessons in Searching a Full Text Database.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1985-01-01

    This article examines the Harvard Business Review Online (HBRO) database (bibliographic description fields, abstracts, extracted information, full text, subject descriptors) and reports on 31 sample HBRO searches conducted in Bibliographic Retrieval Services to test differences between searching full text and searching bibliographic record. Sample…

  7. The Effects of Search Tool Type and Cognitive Style on Performance during Hypermedia Database Searches.

    ERIC Educational Resources Information Center

    Leader, Lars F.; Klein, James D.

    1996-01-01

    Describes a study that investigated the effects of search tools and learner cognitive styles on performance in searches for information within a hypermedia database. Students in a university English-as-a-Second-Language program were assigned to one of four treatment groups, and results show a significant interaction between search tool and…

  8. Should we search Chinese biomedical databases when performing systematic reviews?

    PubMed

    Cohen, Jérémie F; Korevaar, Daniël A; Wang, Junfeng; Spijker, René; Bossuyt, Patrick M

    2015-03-06

    Chinese biomedical databases contain a large number of publications available to systematic reviewers, but it is unclear whether they are used for synthesizing the available evidence. We report a case of two systematic reviews on the accuracy of anti-cyclic citrullinated peptide for diagnosing rheumatoid arthritis. In one of these, the authors did not search Chinese databases; in the other, they did. We additionally assessed the extent to which Cochrane reviewers have searched Chinese databases in a systematic overview of the Cochrane Library (inception to 2014). The two diagnostic reviews included a total of 269 unique studies, but only 4 studies were included in both reviews. The first review included five studies published in the Chinese language (out of 151) while the second included 114 (out of 118). The summary accuracy estimates from the two reviews were comparable. Only 243 of the published 8,680 Cochrane reviews (less than 3%) searched one or more of the five major Chinese databases. These Chinese databases index about 2,500 journals, of which less than 6% are also indexed in MEDLINE. All 243 Cochrane reviews evaluated an intervention, 179 (74%) had at least one author with a Chinese affiliation; 118 (49%) addressed a topic in complementary or alternative medicine. Although searching Chinese databases may lead to the identification of a large amount of additional clinical evidence, Cochrane reviewers have rarely included them in their search strategy. We encourage future initiatives to evaluate more systematically the relevance of searching Chinese databases, as well as collaborative efforts to allow better incorporation of Chinese resources in systematic reviews.

  9. Assigning statistical significance to proteotypic peptides via database searches.

    PubMed

    Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

    2011-02-01

    Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId's knowledge database to include proteotypic information, utilized RAId's statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId's programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those that occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. Published by Elsevier B.V.

  10. Privacy-preserving search for chemical compound databases

    PubMed Central

    2015-01-01

    Background Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information. PMID:26678650

  11. A Methodology for the Development of a Reliability Database for an Advanced Reactor Probabilistic Risk Assessment

    SciTech Connect

    Grabaskas, Dave; Brunett, Acacia J.; Bucknor, Matthew

    2016-06-26

    GE Hitachi Nuclear Energy (GEH) and Argonne National Laboratory are currently engaged in a joint effort to modernize and develop probabilistic risk assessment (PRA) techniques for advanced non-light water reactors. At a high level the primary outcome of this project will be the development of next-generation PRA methodologies that will enable risk-informed prioritization of safety- and reliability-focused research and development, while also identifying gaps that may be resolved through additional research. A subset of this effort is the development of a reliability database (RDB) methodology to determine applicable reliability data for inclusion in the quantification of the PRA. The RDB method developed during this project seeks to satisfy the requirements of the Data Analysis element of the ASME/ANS Non-LWR PRA standard. The RDB methodology utilizes a relevancy test to examine reliability data and determine whether it is appropriate to include as part of the reliability database for the PRA. The relevancy test compares three component properties to establish the level of similarity to components examined as part of the PRA. These properties include the component function, the component failure modes, and the environment/boundary conditions of the component. The relevancy test is used to gauge the quality of data found in a variety of sources, such as advanced reactor-specific databases, non-advanced reactor nuclear databases, and non-nuclear databases. The RDB also establishes the integration of expert judgment or separate reliability analysis with past reliability data. This paper provides details on the RDB methodology, and includes an example application of the RDB methodology for determining the reliability of the intermediate heat exchanger of a sodium fast reactor. The example explores a variety of reliability data sources, and assesses their applicability for the PRA of interest through the use of the relevancy test.

  12. Forensic utilization of familial searches in DNA databases.

    PubMed

    Gershaw, Cassandra J; Schweighardt, Andrew J; Rourke, Linda C; Wallace, Margaret M

    2011-01-01

    DNA evidence is widely recognized as an invaluable tool in the process of investigation and identification, as well as one of the most sought after types of evidence for presentation to a jury. In the United States, the development of state and federal DNA databases has greatly impacted the forensic community by creating an efficient, searchable system that can be used to eliminate or include suspects in an investigation based on matching DNA profiles - the profile already in the database to the profile of the unknown sample in evidence. Recent changes in legislation have begun to allow for the possibility to expand the parameters of DNA database searches, taking into account the possibility of familial searches. This article discusses prospective positive outcomes of utilizing familial DNA searches and acknowledges potential negative outcomes, thereby presenting both sides of this very complicated, rapidly evolving situation.

  13. Complementary use of the SciSearch database for improved biomedical information searching.

    PubMed Central

    Brown, C M

    1998-01-01

    The use of at least two complementary online biomedical databases is generally considered critical for biomedical scientists seeking to keep fully abreast of recent research developments as well as to retrieve the highest number of relevant citations possible. Although the National Library of Medicine's MEDLINE is usually the database of choice, this paper illustrates the benefits of using another database, the Institute for Scientific Information's SciSearch, when conducting a biomedical information search. When a simple query about red wine consumption and coronary artery disease was posed simultaneously in both MEDLINE and SciSearch, a greater number of relevant citations were retrieved through SciSearch. This paper also provides suggestions for carrying out a comprehensive biomedical literature search in a rapid and efficient manner by using SciSearch in conjunction with MEDLINE. PMID:9549014

  14. The LAILAPS search engine: relevance ranking in life science databases.

    PubMed

    Lange, Matthias; Spies, Karl; Bargsten, Joachim; Haberhauer, Gregor; Klapperstück, Matthias; Leps, Michael; Weinel, Christian; Wünschiers, Röbbe; Weissbach, Mandy; Stein, Jens; Scholz, Uwe

    2010-01-15

    Search engines and retrieval systems are popular tools at a life science desktop. The manual inspection of hundreds of database entries, that reflect a life science concept or fact, is a time intensive daily work. Hereby, not the number of query results matters, but the relevance does. In this paper, we present the LAILAPS search engine for life science databases. The concept is to combine a novel feature model for relevance ranking, a machine learning approach to model user relevance profiles, ranking improvement by user feedback tracking and an intuitive and slim web user interface, that estimates relevance rank by tracking user interactions. Queries are formulated as simple keyword lists and will be expanded by synonyms. Supporting a flexible text index and a simple data import format, LAILAPS can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. With a set of features, extracted from each database hit in combination with user relevance preferences, a neural network predicts user specific relevance scores. Using expert knowledge as training data for a predefined neural network or using users own relevance training sets, a reliable relevance ranking of database hits has been implemented. In this paper, we present the LAILAPS system, the concepts, benchmarks and use cases. LAILAPS is public available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.

  15. Molecule database framework: a framework for creating database applications with chemical structure search capability.

    PubMed

    Kiener, Joos

    2013-12-11

    Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes:•Support for multi-component compounds (mixtures)•Import and export of SD-files•Optional security (authorization)For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures).Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. By using a simple web application it was shown that Molecule Database Framework

  16. Molecule database framework: a framework for creating database applications with chemical structure search capability

    PubMed Central

    2013-01-01

    Background Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Results Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes: • Support for multi-component compounds (mixtures) • Import and export of SD-files • Optional security (authorization) For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. Conclusions By using a simple web application it was

  17. A practical approach for inexpensive searches of radiology report databases.

    PubMed

    Desjardins, Benoit; Hamilton, R Curtis

    2007-06-01

    We present a method to perform full text searches of radiology reports for the large number of departments that do not have this ability as part of their radiology or hospital information system. A tool written in Microsoft Access (front-end) has been designed to search a server (back-end) containing the indexed backup weekly copy of the full relational database extracted from a radiology information system (RIS). This front end-/back-end approach has been implemented in a large academic radiology department, and is used for teaching, research and administrative purposes. The weekly second backup of the 80 GB, 4 million record RIS database takes 2 hours. Further indexing of the exported radiology reports takes 6 hours. Individual searches of the indexed database typically take less than 1 minute on the indexed database and 30-60 minutes on the nonindexed database. Guidelines to properly address privacy and institutional review board issues are closely followed by all users. This method has potential to improve teaching, research, and administrative programs within radiology departments that cannot afford more expensive technology.

  18. SSAHA: A Fast Search Method for Large DNA Databases

    PubMed Central

    Ning, Zemin; Cox, Anthony J.; Mullikin, James C.

    2001-01-01

    We describe an algorithm, SSAHA (Sequence Search and Alignment by Hashing Algorithm), for performing fast searches on databases containing multiple gigabases of DNA. Sequences in the database are preprocessed by breaking them into consecutive k-tuples of k contiguous bases and then using a hash table to store the position of each occurrence of each k-tuple. Searching for a query sequence in the database is done by obtaining from the hash table the “hits” for each k-tuple in the query sequence and then performing a sort on the results. We discuss the effect of the tuple length k on the search speed, memory usage, and sensitivity of the algorithm and present the results of computational experiments which show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA, while requiring less memory than suffix tree methods. The SSAHA algorithm is used for high-throughput single nucleotide polymorphism (SNP) detection and very large scale sequence assembly. Also, it provides Web-based sequence search facilities for Ensembl projects. PMID:11591649

  19. Ontology searching and browsing at the Rat Genome Database

    PubMed Central

    Laulederkind, Stanley J. F.; Tutaj, Marek; Shimoyama, Mary; Hayman, G. Thomas; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Wang, Shur-Jen; de Pons, Jeff; Dwinell, Melinda R.; Jacob, Howard J.

    2012-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40 000 rat gene records, as well as human and mouse orthologs, 1857 rat and 1912 human quantitative trait loci (QTLs) and 2347 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. RGD uses more than a dozen different ontologies to standardize annotation information for genes, QTLs and strains. That means a lot of time can be spent searching and browsing ontologies for the appropriate terms needed both for curating and mining the data. RGD has upgraded its ontology term search to make it more versatile and more robust. A term search result is connected to a term browser so the user can fine-tune the search by viewing parent and children terms. Most publicly available term browsers display a hierarchical organization of terms in an expandable tree format. RGD has replaced its old tree browser format with a ‘driller’ type of browser that allows quicker drilling up and down through the term branches, which has been confirmed by testing. The RGD ontology report pages have also been upgraded. Expanded functionality allows more choice in how annotations are displayed and what subsets of annotations are displayed. The new ontology search, browser and report features have been designed to enhance both manual data curation and manual data extraction. Database URL: http://rgd.mcw.edu/rgdweb/ontology/search.html PMID:22434847

  20. Protein Database Searches Using Compositionally Adjusted Substitution Matrices

    PubMed Central

    Altschul, Stephen F.; Wootton, John C.; Gertz, E. Michael; Agarwala, Richa; Morgulis, Aleksandr; Schäffer, Alejandro A.; Yu, Yi-Kuo

    2005-01-01

    Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long-standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches, in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately, we describe several simple criteria under which invoking such adjustment is on average beneficial. In a typical database search, at least one of these criteria is satisfied by over half the related sequence pairs. Compositional substitution matrix adjustment is now available in NCBI's protein-protein version of BLAST. PMID:16218944

  1. Fast and accurate database searches with MS-GF+Percolator

    SciTech Connect

    Granholm, Viktor; Kim, Sangtae; Navarro, Jose' C.; Sjolund, Erik; Smith, Richard D.; Kall, Lukas

    2014-02-28

    To identify peptides and proteins from the large number of fragmentation spectra in mass spectrometrybased proteomics, researches commonly employ so called database search engines. Additionally, postprocessors like Percolator have been used on the results from such search engines, to assess confidence, infer peptides and generally increase the number of identifications. A recent search engine, MS-GF+, has previously been showed to out-perform these classical search engines in terms of the number of identified spectra. However, MS-GF+ generates only limited statistical estimates of the results, hence hampering the biological interpretation. Here, we enabled Percolator-processing for MS-GF+ output, and observed an increased number of identified peptides for a wide variety of datasets. In addition, Percolator directly reports false discovery rate estimates, such as q values and posterior error probabilities, as well as p values, for peptide-spectrum matches, peptides and proteins, functions useful for the whole proteomics community.

  2. A Taxonomic Search Engine: Federating taxonomic databases using web services

    PubMed Central

    Page, Roderic DM

    2005-01-01

    Background The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. Results The Taxonomic Search Engine (TSE) is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO) and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID) authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata) for each name. Conclusion The Taxonomic Search Engine is available at and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names. PMID:15757517

  3. Content-Based Search on a Database of Geometric Models: Identifying Objects of Similar Shape

    SciTech Connect

    XAVIER, PATRICK G.; HENRY, TYSON R.; LAFARGE, ROBERT A.; MEIRANS, LILITA; RAY, LAWRENCE P.

    2001-11-01

    The Geometric Search Engine is a software system for storing and searching a database of geometric models. The database maybe searched for modeled objects similar in shape to a target model supplied by the user. The database models are generally from CAD models while the target model may be either a CAD model or a model generated from range data collected from a physical object. This document describes key generation, database layout, and search of the database.

  4. Multi-Database Searching in the Behavioral Sciences--Part I: Basic Techniques and Core Databases.

    ERIC Educational Resources Information Center

    Angier, Jennifer J.; Epstein, Barbara A.

    1980-01-01

    Outlines practical searching techniques in seven core behavioral science databases accessing psychological literature: Psychological Abstracts, Social Science Citation Index, Biosis, Medline, Excerpta Medica, Sociological Abstracts, ERIC. Use of individual files is discussed and their relative strengths/weaknesses are compared. Appended is a list…

  5. Are Bibliographic Management Software Search Interfaces Reliable?: A Comparison between Search Results Obtained Using Database Interfaces and the EndNote Online Search Function

    ERIC Educational Resources Information Center

    Fitzgibbons, Megan; Meert, Deborah

    2010-01-01

    The use of bibliographic management software and its internal search interfaces is now pervasive among researchers. This study compares the results between searches conducted in academic databases' search interfaces versus the EndNote search interface. The results show mixed search reliability, depending on the database and type of search…

  6. Are Bibliographic Management Software Search Interfaces Reliable?: A Comparison between Search Results Obtained Using Database Interfaces and the EndNote Online Search Function

    ERIC Educational Resources Information Center

    Fitzgibbons, Megan; Meert, Deborah

    2010-01-01

    The use of bibliographic management software and its internal search interfaces is now pervasive among researchers. This study compares the results between searches conducted in academic databases' search interfaces versus the EndNote search interface. The results show mixed search reliability, depending on the database and type of search…

  7. Probabilistic acute dietary exposure assessments to captan and tolylfluanid using several European food consumption and pesticide concentration databases.

    PubMed

    Boon, Polly E; Svensson, Kettil; Moussavian, Shahnaz; van der Voet, Hilko; Petersen, Annette; Ruprich, Jiri; Debegnach, Francesca; de Boer, Waldo J; van Donkersgoed, Gerda; Brera, Carlo; van Klaveren, Jacob D; Busk, Leif

    2009-12-01

    Probabilistic dietary acute exposure assessments of captan and tolylfluanid were performed for the populations of the Czech Republic, Denmark, Italy, the Netherlands and Sweden. The basis for these assessments was national databases for food consumption and pesticide concentration data harmonised at the level of raw agricultural commodity. Data were obtained from national food consumption surveys and national monitoring programmes and organised in an electronic platform of databases connected to probabilistic software. The exposure assessments were conducted by linking national food consumption data either (1) to national pesticide concentration data or (2) to a pooled database containing all national pesticide concentration data. We show that with this tool national exposure assessments can be performed in a harmonised way and that pesticide concentrations of other countries can be linked to national food consumption surveys. In this way it is possible to exchange or merge concentration data between countries in situations of data scarcity. This electronic platform in connection with probabilistic software can be seen as a prototype of a data warehouse, including a harmonised approach for dietary exposure modelling.

  8. Feature selection in validating mass spectrometry database search results.

    PubMed

    Fang, Jianwen; Dong, Yinghua; Williams, Todd D; Lushington, Gerald H

    2008-02-01

    Tandem mass spectrometry (MS/MS) combined with protein database searching has been widely used in protein identification. A validation procedure is generally required to reduce the number of false positives. Advanced tools using statistical and machine learning approaches may provide faster and more accurate validation than manual inspection and empirical filtering criteria. In this study, we use two feature selection algorithms based on random forest and support vector machine to identify peptide properties that can be used to improve validation models. We demonstrate that an improved model based on an optimized set of features reduces the number of false positives by 58% relative to the model which used only search engine scores, at the same sensitivity score of 0.8. In addition, we develop classification models based on the physicochemical properties and protein sequence environment of these peptides without using search engine scores. The performance of the best model based on the support vector machine algorithm is at 0.8 AUC, 0.78 accuracy, and 0.7 specificity, suggesting a reasonably accurate classification. The identified properties important to fragmentation and ionization can be either used in independent validation tools or incorporated into peptide sequencing and database search algorithms to improve existing software programs.

  9. A Multilevel Probabilistic Beam Search Algorithm for the Shortest Common Supersequence Problem

    PubMed Central

    Gallardo, José E.

    2012-01-01

    The shortest common supersequence problem is a classical problem with many applications in different fields such as planning, Artificial Intelligence and especially in Bioinformatics. Due to its NP-hardness, we can not expect to efficiently solve this problem using conventional exact techniques. This paper presents a heuristic to tackle this problem based on the use at different levels of a probabilistic variant of a classical heuristic known as Beam Search. The proposed algorithm is empirically analysed and compared to current approaches in the literature. Experiments show that it provides better quality solutions in a reasonable time for medium and large instances of the problem. For very large instances, our heuristic also provides better solutions, but required execution times may increase considerably. PMID:23300667

  10. Enriching Great Britain's National Landslide Database by searching newspaper archives

    NASA Astrophysics Data System (ADS)

    Taylor, Faith E.; Malamud, Bruce D.; Freeborough, Katy; Demeritt, David

    2015-11-01

    Our understanding of where landslide hazard and impact will be greatest is largely based on our knowledge of past events. Here, we present a method to supplement existing records of landslides in Great Britain by searching an electronic archive of regional newspapers. In Great Britain, the British Geological Survey (BGS) is responsible for updating and maintaining records of landslide events and their impacts in the National Landslide Database (NLD). The NLD contains records of more than 16,500 landslide events in Great Britain. Data sources for the NLD include field surveys, academic articles, grey literature, news, public reports and, since 2012, social media. We aim to supplement the richness of the NLD by (i) identifying additional landslide events, (ii) acting as an additional source of confirmation of events existing in the NLD and (iii) adding more detail to existing database entries. This is done by systematically searching the Nexis UK digital archive of 568 regional newspapers published in the UK. In this paper, we construct a robust Boolean search criterion by experimenting with landslide terminology for four training periods. We then apply this search to all articles published in 2006 and 2012. This resulted in the addition of 111 records of landslide events to the NLD over the 2 years investigated (2006 and 2012). We also find that we were able to obtain information about landslide impact for 60-90% of landslide events identified from newspaper articles. Spatial and temporal patterns of additional landslides identified from newspaper articles are broadly in line with those existing in the NLD, confirming that the NLD is a representative sample of landsliding in Great Britain. This method could now be applied to more time periods and/or other hazards to add richness to databases and thus improve our ability to forecast future events based on records of past events.

  11. Physical Database Design for Efficient Time-Series Similarity Search

    NASA Astrophysics Data System (ADS)

    Kim, Sang-Wook; Kim, Jinho; Park, Sanghyun

    Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.

  12. MassMatrix: A Database Search Program for Rapid Characterization of Proteins and Peptides from Tandem Mass Spectrometry Data

    PubMed Central

    Xu, Hua; Freitas, Michael A.

    2009-01-01

    MassMatrix is a program that matches tandem mass spectra with theoretical peptide sequences derived from a protein database. The program uses a mass accuracy sensitive probabilistic score model to rank peptide matches. The tandem mass spectrometry search software was evaluated by use of a high mass accuracy data set and its results compared with those from Mascot, SEQUEST, X!Tandem, and OMSSA. For the high mass accuracy data, MassMatrix provided better sensitivity than Mascot, SEQUEST, X!Tandem, and OMSSA for a given specificity and the percentage of false positives was 2%. More importantly all manually validated true positives corresponded to a unique peptide/spectrum match. The presence of decoy sequence and additional variable post-translational modifications did not significantly affect the results from the high mass accuracy search. MassMatrix performs well when compared with Mascot, SEQUEST, X!Tandem, and OMSSA with regard to search time. MassMatrix was also run on a distributed memory clusters and achieved search speeds of ~100,000 spectra per hour when searching against a complete human database with 8 variable modifications. The algorithm is available for public searches at http://www.massmatrix.net. PMID:19235167

  13. Compact variant-rich customized sequence database and a fast and sensitive database search for efficient proteogenomic analyses.

    PubMed

    Park, Heejin; Bae, Junwoo; Kim, Hyunwoo; Kim, Sangok; Kim, Hokeun; Mun, Dong-Gi; Joh, Yoonsung; Lee, Wonyeop; Chae, Sehyun; Lee, Sanghyuk; Kim, Hark Kyun; Hwang, Daehee; Lee, Sang-Won; Paek, Eunok

    2014-12-01

    In proteogenomic analysis, construction of a compact, customized database from mRNA-seq data and a sensitive search of both reference and customized databases are essential to accurately determine protein abundances and structural variations at the protein level. However, these tasks have not been systematically explored, but rather performed in an ad-hoc fashion. Here, we present an effective method for constructing a compact database containing comprehensive sequences of sample-specific variants--single nucleotide variants, insertions/deletions, and stop-codon mutations derived from Exome-seq and RNA-seq data. It, however, occupies less space by storing variant peptides, not variant proteins. We also present an efficient search method for both customized and reference databases. The separate searches of the two databases increase the search time, and a unified search is less sensitive to identify variant peptides due to the smaller size of the customized database, compared to the reference database, in the target-decoy setting. Our method searches the unified database once, but performs target-decoy validations separately. Experimental results show that our approach is as fast as the unified search and as sensitive as the separate searches. Our customized database includes mutation information in the headers of variant peptides, thereby facilitating the inspection of peptide-spectrum matches.

  14. Fast and accurate database searches with MS-GF+Percolator.

    PubMed

    Granholm, Viktor; Kim, Sangtae; Navarro, José C F; Sjölund, Erik; Smith, Richard D; Käll, Lukas

    2014-02-07

    One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community.

  15. Supporting Ontology-based Keyword Search over Medical Databases

    PubMed Central

    Kementsietsidis, Anastasios; Lim, Lipyeow; Wang, Min

    2008-01-01

    The proliferation of medical terms poses a number of challenges in the sharing of medical information among different stakeholders. Ontologies are commonly used to establish relationships between different terms, yet their role in querying has not been investigated in detail. In this paper, we study the problem of supporting ontology-based keyword search queries on a database of electronic medical records. We present several approaches to support this type of queries, study the advantages and limitations of each approach, and summarize the lessons learned as best practices. PMID:18998839

  16. The Saccharomyces Genome Database: Advanced Searching Methods and Data Mining.

    PubMed

    Cherry, J Michael

    2015-12-02

    At the core of the Saccharomyces Genome Database (SGD) are chromosomal features that encode a product. These include protein-coding genes and major noncoding RNA genes, such as tRNA and rRNA genes. The basic entry point into SGD is a gene or open-reading frame name that leads directly to the locus summary information page. A keyword describing function, phenotype, selective condition, or text from abstracts will also provide a door into the SGD. A DNA or protein sequence can be used to identify a gene or a chromosomal region using BLAST. Protein and DNA sequence identifiers, PubMed and NCBI IDs, author names, and function terms are also valid entry points. The information in SGD has been gathered and is maintained by a group of scientific biocurators and software developers who are devoted to providing researchers with up-to-date information from the published literature, connections to all the major research resources, and tools that allow the data to be explored. All the collected information cannot be represented or summarized for every possible question; therefore, it is necessary to be able to search the structured data in the database. This protocol describes the YeastMine tool, which provides an advanced search capability via an interactive tool. The SGD also archives results from microarray expression experiments, and a strategy designed to explore these data using the SPELL (Serial Pattern of Expression Levels Locator) tool is provided.

  17. Accelerating chemical database searching using graphics processing units.

    PubMed

    Liu, Pu; Agrafiotis, Dimitris K; Rassokhin, Dmitrii N; Yang, Eric

    2011-08-22

    The utility of chemoinformatics systems depends on the accurate computer representation and efficient manipulation of chemical compounds. In such systems, a small molecule is often digitized as a large fingerprint vector, where each element indicates the presence/absence or the number of occurrences of a particular structural feature. Since in theory the number of unique features can be exceedingly large, these fingerprint vectors are usually folded into much shorter ones using hashing and modulo operations, allowing fast "in-memory" manipulation and comparison of molecules. There is increasing evidence that lossless fingerprints can substantially improve retrieval performance in chemical database searching (substructure or similarity), which have led to the development of several lossless fingerprint compression algorithms. However, any gains in storage and retrieval afforded by compression need to be weighed against the extra computational burden required for decompression before these fingerprints can be compared. Here we demonstrate that graphics processing units (GPU) can greatly alleviate this problem, enabling the practical application of lossless fingerprints on large databases. More specifically, we show that, with the help of a ~$500 ordinary video card, the entire PubChem database of ~32 million compounds can be searched in ~0.2-2 s on average, which is 2 orders of magnitude faster than a conventional CPU. If multiple query patterns are processed in batch, the speedup is even more dramatic (less than 0.02-0.2 s/query for 1000 queries). In the present study, we use the Elias gamma compression algorithm, which results in a compression ratio as high as 0.097.

  18. Grover's search algorithm with an entangled database state

    NASA Astrophysics Data System (ADS)

    Alsing, Paul M.; McDonald, Nathan

    2011-05-01

    Grover's oracle based unstructured search algorithm is often stated as "given a phone number in a directory, find the associated name." More formally, the problem can be stated as "given as input a unitary black box Uf for computing an unknown function f:{0,1}n ->{0,1}find x=x0 an element of {0,1}n such that f(x0) =1, (and zero otherwise). The crucial role of the externally supplied oracle Uf (whose inner workings are unknown to the user) is to change the sign of the solution 0 x , while leaving all other states unaltered. Thus, Uf depends on the desired solution x0. This paper examines an amplitude amplification algorithm in which the user encodes the directory (e.g. names and telephone numbers) into an entangled database state, which at a later time can be queried on one supplied component entry (e.g. a given phone number t0) to find the other associated unknown component (e.g. name x0). For N=2n names x with N associated phone numbers t , performing amplitude amplification on a subspace of size N of the total space of size N2 produces the desired state 0 0 x t in √N steps. We discuss how and why sequential (though not concurrent parallel) searches can be performed on multiple database states. Finally, we show how this procedure can be generalized to databases with more than two correlated lists (e.g. x t s r ...).

  19. Numerical database system based on a weighted search tree

    NASA Astrophysics Data System (ADS)

    Park, S. C.; Bahri, C.; Draayer, J. P.; Zheng, S.-Q.

    1994-09-01

    An on-line numerical database system, that is based on the concept of a weighted search tree and which functions like a file directory, is introduced. The system, which is designed to aid in reducing time-consuming redundant calculations in numerically intensive computations, can be used to fetch, insert and delete items from a dynamically generated list in optimal [ O(log n) where n is the number of items in the list] time. Items in the list are ordered according to a priority queue with the initial priority for each element set either automatically or by an user supplied algorithm. The priority queue is updated on-the-fly to reflect element hit frequency. Items can be added to a database so long as there is space to accommodate them, and when there is not, the lowest priority element(s) is removed to make room for an incoming element(s) with higher priority. The system acts passively and therefore can be applied to any number of databases, with the same or different structures, within a single application.

  20. Searching protein structure databases with DaliLite v.3.

    PubMed

    Holm, L; Kääriäinen, S; Rosenström, P; Schenkel, A

    2008-12-01

    The Red Queen said, 'It takes all the running you can do, to keep in the same place.' Lewis Carrol Newly solved protein structures are routinely scanned against structures already in the Protein Data Bank (PDB) using Internet servers. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. The number of known structures continues to grow exponentially. Sensitive-thorough but slow-search algorithms are challenged to deliver results in a reasonable time, as there are now more structures in the PDB than seconds in a day. The brute-force solution would be to distribute the individual comparisons on a massively parallel computer. A frugal solution, as implemented in the Dali server, is to reduce the total computational cost by pruning search space using prior knowledge about the distribution of structures in fold space. This note reports paradigm revisions that enable maintaining such a knowledge base up-to-date on a PC. The Dali server for protein structure database searching at http://ekhidna.biocenter.helsinki.fi/dali_server is running DaliLite v.3. The software can be downloaded for academic use from http://ekhidna.biocenter.helsinki.fi/dali_lite/downloads/v3.

  1. Searching protein structure databases with DaliLite v.3

    PubMed Central

    Holm, L.; Kääriäinen, S.; Rosenström, P.; Schenkel, A.

    2008-01-01

    The Red Queen said, ‘It takes all the running you can do, to keep in the same place.’ Lewis Carrol Motivation: Newly solved protein structures are routinely scanned against structures already in the Protein Data Bank (PDB) using Internet servers. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. The number of known structures continues to grow exponentially. Sensitive—thorough but slow—search algorithms are challenged to deliver results in a reasonable time, as there are now more structures in the PDB than seconds in a day. The brute-force solution would be to distribute the individual comparisons on a massively parallel computer. A frugal solution, as implemented in the Dali server, is to reduce the total computational cost by pruning search space using prior knowledge about the distribution of structures in fold space. This note reports paradigm revisions that enable maintaining such a knowledge base up-to-date on a PC. Availability: The Dali server for protein structure database searching at http://ekhidna.biocenter.helsinki.fi/dali_server is running DaliLite v.3. The software can be downloaded for academic use from http://ekhidna.biocenter.helsinki.fi/dali_lite/downloads/v3. Contact: liisa.holm@helsinki.fi PMID:18818215

  2. Global Identification of Protein Post-translational Modifications in a Single-Pass Database Search.

    PubMed

    Shortreed, Michael R; Wenger, Craig D; Frey, Brian L; Sheynkman, Gloria M; Scalf, Mark; Keller, Mark P; Attie, Alan D; Smith, Lloyd M

    2015-11-06

    Bottom-up proteomics database search algorithms used for peptide identification cannot comprehensively identify post-translational modifications (PTMs) in a single-pass because of high false discovery rates (FDRs). A new approach to database searching enables global PTM (G-PTM) identification by exclusively looking for curated PTMs, thereby avoiding the FDR penalty experienced during conventional variable modification searches. We identified over 2200 unique, high-confidence modified peptides comprising 26 different PTM types in a single-pass database search.

  3. Inbred Strain Variant Database (ISVdb): A Repository for Probabilistically Informed Sequence Differences Among the Collaborative Cross Strains and Their Founders

    PubMed Central

    Oreper, Daniel; Cai, Yanwei; Tarantino, Lisa M.; de Villena, Fernando Pardo-Manuel; Valdar, William

    2017-01-01

    The Collaborative Cross (CC) is a panel of recently established multiparental recombinant inbred mouse strains. For the CC, as for any multiparental population (MPP), effective experimental design and analysis benefit from detailed knowledge of the genetic differences between strains. Such differences can be directly determined by sequencing, but until now whole-genome sequencing was not publicly available for individual CC strains. An alternative and complementary approach is to infer genetic differences by combining two pieces of information: probabilistic estimates of the CC haplotype mosaic from a custom genotyping array, and probabilistic variant calls from sequencing of the CC founders. The computation for this inference, especially when performed genome-wide, can be intricate and time-consuming, requiring the researcher to generate nontrivial and potentially error-prone scripts. To provide standardized, easy-to-access CC sequence information, we have developed the Inbred Strain Variant Database (ISVdb). The ISVdb provides, for all the exonic variants from the Sanger Institute mouse sequencing dataset, direct sequence information for CC founders and, critically, the imputed sequence information for CC strains. Notably, the ISVdb also: (1) provides predicted variant consequence metadata; (2) allows rapid simulation of F1 populations; and (3) preserves imputation uncertainty, which will allow imputed data to be refined in the future as additional sequencing and genotyping data are collected. The ISVdb information is housed in an SQL database and is easily accessible through a custom online interface (http://isvdb.unc.edu), reducing the analytic burden on any researcher using the CC. PMID:28592645

  4. Global vs. Localized Search: A Comparison of Database Selection Methods in a Hierarchical Environment.

    ERIC Educational Resources Information Center

    Conrad, Jack G.; Claussen, Joanne Smestad; Yang, Changwen

    2002-01-01

    Compares standard global information retrieval searching with more localized techniques to address the database selection problem that users often have when searching for the most relevant database, based on experiences with the Westlaw Directory. Findings indicate that a browse plus search approach in a hierarchical environment produces the most…

  5. Comparison Study of Overlap among 21 Scientific Databases in Searching Pesticide Information.

    ERIC Educational Resources Information Center

    Meyer, Daniel E.; And Others

    1983-01-01

    Evaluates overlapping coverage of 21 scientific databases used in 10 online pesticide searches in an attempt to identify minimum number of databases needed to generate 90 percent of unique, relevant citations for given search. Comparison of searches combined under given pesticide usage (herbicide, fungicide, insecticide) is discussed. Nine…

  6. Comparison Study of Overlap among 21 Scientific Databases in Searching Pesticide Information.

    ERIC Educational Resources Information Center

    Meyer, Daniel E.; And Others

    1983-01-01

    Evaluates overlapping coverage of 21 scientific databases used in 10 online pesticide searches in an attempt to identify minimum number of databases needed to generate 90 percent of unique, relevant citations for given search. Comparison of searches combined under given pesticide usage (herbicide, fungicide, insecticide) is discussed. Nine…

  7. Search Databases and Statistics: Pitfalls and Best Practices in Phosphoproteomics.

    PubMed

    Refsgaard, Jan C; Munk, Stephanie; Jensen, Lars J

    2016-01-01

    Advances in mass spectrometric instrumentation in the past 15 years have resulted in an explosion in the raw data yield from typical phosphoproteomics workflows. This poses the challenge of confidently identifying peptide sequences, localizing phosphosites to proteins and quantifying these from the vast amounts of raw data. This task is tackled by computational tools implementing algorithms that match the experimental data to databases, providing the user with lists for downstream analysis. Several platforms for such automated interpretation of mass spectrometric data have been developed, each having strengths and weaknesses that must be considered for the individual needs. These are reviewed in this chapter. Equally critical for generating highly confident output datasets is the application of sound statistical criteria to limit the inclusion of incorrect peptide identifications from database searches. Additionally, careful filtering and use of appropriate statistical tests on the output datasets affects the quality of all downstream analyses and interpretation of the data. Our considerations and general practices on these aspects of phosphoproteomics data processing are presented here.

  8. Unambiguous identification of coherent states: Searching a quantum database

    SciTech Connect

    Sedlak, Michal; Ziman, Mario; Pribyla, Ondrej; Buzek, Vladimir; Hillery, Mark

    2007-08-15

    We consider an unambiguous identification of an unknown coherent state with one of two unknown coherent reference states. Specifically, we consider two modes of an electromagnetic field prepared in unknown coherent states vertical bar {alpha}{sub 1}> and vertical bar {alpha}{sub 2}>, respectively. The third mode is prepared either in the state vertical bar {alpha}{sub 1}> or in the state vertical bar {alpha}{sub 2}>. The task is to identify (unambiguously) which of the two modes are in the same state. We present a scheme consisting of three beam splitters capable to perform this task. Although we do not prove the optimality, we show that the performance of the proposed setup is better than the generalization of the optimal measurement known for a finite-dimensional case. We show that a single beam splitter is capable to perform an unambiguous quantum state comparison for coherent states optimally. Finally, we propose an experimental setup consisting of 2N-1 beam splitters for unambiguous identification among N unknown coherent states. This setup can be considered as a search in a quantum database. The elements of the database are unknown coherent states encoded in different modes of an electromagnetic field. The task is to specify the two modes that are excited in the same, though unknown, coherent state.

  9. Cycloquest: Identification of cyclopeptides via database search of their mass spectra against genome databases

    PubMed Central

    Mohimani, Hosein; Liu, Wei-Ting; Mylne, Joshua S.; Poth, Aaron G.; Colgrave, Michelle L.; Tran, Dat; Selsted, Michael E.; Dorrestein, Pieter C.; Pevzner, Pavel A.

    2011-01-01

    Hundreds of ribosomally synthesized cyclopeptides have been isolated from all domains of life, the vast majority having been reported in the last 15 years. Studies of cyclic peptides have highlighted their exceptional potential both as stable drug scaffolds and as biomedicines in their own right. Despite this, computational techniques for cyclopeptide identification are still in their infancy, with many such peptides remaining uncharacterized. Tandem mass spectrometry has occupied a niche role in cyclopeptide identification, taking over from traditional techniques such as nuclear magnetic resonance spectroscopy (NMR). MS/MS studies require only picogram quantities of peptide (compared to milligrams for NMR studies) and are applicable to complex samples, abolishing the requirement for time-consuming chromatographic purification. While database search tools such as Sequest and Mascot have become standard tools for the MS/MS identification of linear peptides, they are not applicable to cyclopeptides, due to the parent mass shift resulting from cyclization, and different fragmentation pattern of cyclic peptides. In this paper, we describe the development of a novel database search methodology to aid in the identification of cyclopeptides by mass spectrometry, and evaluate its utility in identifying two peptide rings from Helianthus annuus, a bacterial cannibalism factor from Bacillus subtilis, and a θ-defensin from Rhesus macaque. PMID:21851130

  10. An approach in building a chemical compound search engine in oracle database.

    PubMed

    Wang, H; Volarath, P; Harrison, R

    2005-01-01

    A searching or identifying of chemical compounds is an important process in drug design and in chemistry research. An efficient search engine involves a close coupling of the search algorithm and database implementation. The database must process chemical structures, which demands the approaches to represent, store, and retrieve structures in a database system. In this paper, a general database framework for working as a chemical compound search engine in Oracle database is described. The framework is devoted to eliminate data type constrains for potential search algorithms, which is a crucial step toward building a domain specific query language on top of SQL. A search engine implementation based on the database framework is also demonstrated. The convenience of the implementation emphasizes the efficiency and simplicity of the framework.

  11. The Use of AJAX in Searching a Bibliographic Database: A Case Study of the Italian Biblioteche Oggi Database

    ERIC Educational Resources Information Center

    Cavaleri, Piero

    2008-01-01

    Purpose: The purpose of this paper is to describe the use of AJAX for searching the Biblioteche Oggi database of bibliographic records. Design/methodology/approach: The paper is a demonstration of how bibliographic database single page interfaces allow the implementation of more user-friendly features for social and collaborative tasks. Findings:…

  12. The Use of AJAX in Searching a Bibliographic Database: A Case Study of the Italian Biblioteche Oggi Database

    ERIC Educational Resources Information Center

    Cavaleri, Piero

    2008-01-01

    Purpose: The purpose of this paper is to describe the use of AJAX for searching the Biblioteche Oggi database of bibliographic records. Design/methodology/approach: The paper is a demonstration of how bibliographic database single page interfaces allow the implementation of more user-friendly features for social and collaborative tasks. Findings:…

  13. Towards computational improvement of DNA database indexing and short DNA query searching

    PubMed Central

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-01-01

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions are not reported, if the database is searched against a query shorter than nucleotides, such that is the length of the DNA database words being mapped and is the length of the query. A solution of this drawback is also presented. PMID:26019584

  14. Towards computational improvement of DNA database indexing and short DNA query searching.

    PubMed

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-09-03

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.

  15. EasyKSORD: A Platform of Keyword Search Over Relational Databases

    NASA Astrophysics Data System (ADS)

    Peng, Zhaohui; Li, Jing; Wang, Shan

    Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.

  16. Adaptive search in mobile peer-to-peer databases

    NASA Technical Reports Server (NTRS)

    Wolfson, Ouri (Inventor); Xu, Bo (Inventor)

    2010-01-01

    Information is stored in a plurality of mobile peers. The peers communicate in a peer to peer fashion, using a short-range wireless network. Occasionally, a peer initiates a search for information in the peer to peer network by issuing a query. Queries and pieces of information, called reports, are transmitted among peers that are within a transmission range. For each search additional peers are utilized, wherein these additional peers search and relay information on behalf of the originator of the search.

  17. Searching Databases without Query-Building Aids: Implications for Dyslexic Users

    ERIC Educational Resources Information Center

    Berget, Gerd; Sandnes, Frode Eika

    2015-01-01

    Introduction: Few studies document the information searching behaviour of users with cognitive impairments. This paper therefore addresses the effect of dyslexia on information searching in a database with no tolerance for spelling errors and no query-building aids. The purpose was to identify effective search interface design guidelines that…

  18. Medical Students' Personal Knowledge, Searching Proficiency, and Database Use in Problem Solving.

    ERIC Educational Resources Information Center

    Wildemuth, Barbara M.; And Others

    1995-01-01

    Discusses the relationship between personal knowledge in a domain and online searching proficiency in that domain, and the relationship between searching proficiency and database-assisted problem-solving performance based on a study of medical students. Search results, selection of terms, and efficiency were found to be related to problem-solving…

  19. (Sub)structure Searches in Databases Containing Generic Chemical Structure Representations.

    ERIC Educational Resources Information Center

    Schoch-Grubler, Ursula

    1990-01-01

    Reviews three database systems available for searching generic chemical structure representations: (1) Derwent's Chemical Code System; (2) IDC's Gremas System; and (3) Derwent's Markush DARC System. Various types of searches are described, features desirable to users are discussed, and comparison searches are described that measured recall and…

  20. Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification.

    PubMed

    Li, Honglan; Joh, Yoon Sung; Kim, Hyunwoo; Paek, Eunok; Lee, Sang-Won; Hwang, Kyu-Baek

    2016-12-22

    Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available. To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches. We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search.

  1. Optimal design of groundwater remediation systems using a probabilistic multi-objective fast harmony search algorithm under uncertainty

    NASA Astrophysics Data System (ADS)

    Luo, Q.; Wu, J.; Qian, J.

    2013-12-01

    This study develops a new probabilistic multi-objective fast harmony search algorithm (PMOFHS) for optimal design of groundwater remediation system under uncertainty associated with the hydraulic conductivity of aquifers. The PMOFHS integrates the previously developed deterministic multi-objective optimization method, namely multi-objective fast harmony search algorithm (MOFHS) with a probabilistic Pareto domination ranking and probabilistic niche technique to search for Pareto-optimal solutions to multi-objective optimization problems in a noisy hydrogeological environment arising from insufficient hydraulic conductivity data. The PMOFHS is then coupled with the commonly used flow and transport codes, MODFLOW and MT3DMS, to identify the optimal groundwater remediation system of a two-dimensional hypothetical test problem involving two objectives: (i) minimization of the total remediation cost through the engineering planning horizon, and (ii) minimization of the percentage of mass remaining in the aquifer at the end of the operational period, which uses the Pump-and-Treat (PAT) technology to clean up contaminated groundwater. Also, Monte Carlo (MC) analysis is used to demonstrate the effectiveness of the proposed methodology. The MC analysis is taken to each Pareto solutions for every K realization. Then the statistical mean and the upper and lower bounds of uncertainty intervals of 95% confidence level are calculated. The MC analysis results show that all of the Pareto-optimal solutions are located between the upper and lower bounds of the MC analysis. Moreover, the root mean square errors (RMSEs) between the Pareto-optimal solutions by the PMOFHS and the average values of optimal solutions by the MC analysis are 0.0204 for the first objective and 0.0318 for the second objective, quite smaller than those RMSEs between the results by the existing probabilistic multi-objective genetic algorithm (PMOGA) and the MC analysis, 0.0384 and 0.0397, respectively. In

  2. Seismic hazard assessment for Myanmar: Earthquake model database, ground-motion scenarios, and probabilistic assessments

    NASA Astrophysics Data System (ADS)

    Chan, C. H.; Wang, Y.; Thant, M.; Maung Maung, P.; Sieh, K.

    2015-12-01

    We have constructed an earthquake and fault database, conducted a series of ground-shaking scenarios, and proposed seismic hazard maps for all of Myanmar and hazard curves for selected cities. Our earthquake database integrates the ISC, ISC-GEM and global ANSS Comprehensive Catalogues, and includes harmonized magnitude scales without duplicate events. Our active fault database includes active fault data from previous studies. Using the parameters from these updated databases (i.e., the Gutenberg-Richter relationship, slip rate, maximum magnitude and the elapse time of last events), we have determined the earthquake recurrence models of seismogenic sources. To evaluate the ground shaking behaviours in different tectonic regimes, we conducted a series of tests by matching the modelled ground motions to the felt intensities of earthquakes. Through the case of the 1975 Bagan earthquake, we determined that Atkinson and Moore's (2003) scenario using the ground motion prediction equations (GMPEs) fits the behaviours of the subduction events best. Also, the 2011 Tarlay and 2012 Thabeikkyin events suggested the GMPEs of Akkar and Cagnan (2010) fit crustal earthquakes best. We thus incorporated the best-fitting GMPEs and site conditions based on Vs30 (the average shear-velocity down to 30 m depth) from analysis of topographic slope and microtremor array measurements to assess seismic hazard. The hazard is highest in regions close to the Sagaing Fault and along the Western Coast of Myanmar as seismic sources there have earthquakes occur at short intervals and/or last events occurred a long time ago. The hazard curves for the cities of Bago, Mandalay, Sagaing, Taungoo and Yangon show higher hazards for sites close to an active fault or with a low Vs30, e.g., the downtown of Sagaing and Shwemawdaw Pagoda in Bago.

  3. Searching for the evidence: a practical guide to some online databases in chiropractic and osteopathy.

    PubMed

    Parkhill, Anne

    2004-11-01

    Chiropractic and Osteopathy are categorised within the family of Complementary and Alternative Medicine (CAM) by most indexers and database managers. CAM therapies can be difficult to search because relevant resources are spread over a number of databases. This paper aims to introduce basic searching skills for six databases which offer CAM literature. Six readily available databases which can be used by a busy clinician to remain informed about best practice were chosen. The databases were searched and compared using two clinical scenarios as sample searches. Evidence-based practice demands that practitioners maintain their information gathering skills, but no one source provides all the answers. We are lured by the thought that everything is available on the web easily and speedily, but may sacrifice quality for ease and speed of retrieval.

  4. Uninformed and probabilistic distributed agent combinatorial searches for the unary NP-complete disassembly line balancing problem

    NASA Astrophysics Data System (ADS)

    McGovern, Seamus M.; Gupta, Surendra M.

    2005-11-01

    Disassembly takes place in remanufacturing, recycling, and disposal, with a line being the best choice for automation. The disassembly line balancing problem seeks a sequence which: is feasible, minimizes workstations, and ensures similar idle times, as well as other end-of-life specific concerns. Finding the optimal balance is computationally intensive due to exponential growth. Combinatorial optimization methods hold promise for providing solutions to the disassembly line balancing problem, which is proven here to belong to the class of unary NP-complete problems. Probabilistic (ant colony optimization) and uninformed (H-K) search methods are presented and compared. Numerical results are obtained using a recent case study to illustrate the search implementations and compare their performance. Conclusions drawn include the consistent generation of near-optimal solutions, the ability to preserve precedence, the speed of the techniques, and their practicality due to ease of implementation.

  5. Searching for religion and mental health studies required health, social science, and grey literature databases.

    PubMed

    Wright, Judy M; Cottrell, David J; Mir, Ghazala

    2014-07-01

    To determine the optimal databases to search for studies of faith-sensitive interventions for treating depression. We examined 23 health, social science, religious, and grey literature databases searched for an evidence synthesis. Databases were prioritized by yield of (1) search results, (2) potentially relevant references identified during screening, (3) included references contained in the synthesis, and (4) included references that were available in the database. We assessed the impact of databases beyond MEDLINE, EMBASE, and PsycINFO by their ability to supply studies identifying new themes and issues. We identified pragmatic workload factors that influence database selection. PsycINFO was the best performing database within all priority lists. ArabPsyNet, CINAHL, Dissertations and Theses, EMBASE, Global Health, Health Management Information Consortium, MEDLINE, PsycINFO, and Sociological Abstracts were essential for our searches to retrieve the included references. Citation tracking activities and the personal library of one of the research teams made significant contributions of unique, relevant references. Religion studies databases (Am Theo Lib Assoc, FRANCIS) did not provide unique, relevant references. Literature searches for reviews and evidence syntheses of religion and health studies should include social science, grey literature, non-Western databases, personal libraries, and citation tracking activities. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Searching the expressed sequence tag (EST) databases: panning for genes.

    PubMed

    Jongeneel, C V

    2000-02-01

    The genomes of living organisms contain many elements, including genes coding for proteins. The portions of the genes expressed as mature mRNA, collectively known as the transcriptome, represent only a small part of the genome. The expressed sequence tag (EST) databases contain an increasingly large part of the transcriptome of many species. For this reason, these databases are probably the most abundant source of new coding sequences available today. However, the raw data deposited in the EST databases are to a large extent unorganised, unannotated, redundant and of relatively low quality. This paper reviews some of the characteristics of the EST data, and the methods that can be used to find novel protein sequences within them. It also documents a collection of databases, software and web sites that can be useful to biologists interested in mining the EST databases over the Internet, or in establishing a local environment for such analyses.

  7. Probabilistic person identification in TV news programs using image web database

    NASA Astrophysics Data System (ADS)

    Battisti, F.; Carli, M.; Leo, M.; Neri, A.

    2014-02-01

    The automatic labeling of faces in TV broadcasting is still a challenging problem. The high variability in view points, facial expressions, general appearance, and lighting conditions, as well as occlusions, rapid shot changes, and camera motions, produce significant variations in image appearance. The application of automatic tools for face recognition is not yet fully established and the human intervention is needed. In this paper, we deal with the automatic face recognition in TV broadcasting programs. The target of the proposed method is to identify the presence of a specific person in a video by means of a set of images downloaded from Web using a specific search key.

  8. Content Evaluation of Textual CD-ROM and Web Databases. Database Searching Series.

    ERIC Educational Resources Information Center

    Jacso, Peter

    This book provides guidelines for evaluating a variety of database types, including abstracting and indexing, directory, full-text, and page-image databases available in online and/or CD-ROM formats. The book discusses the purpose and techniques of comparing and evaluating the most important characteristics of textual databases, such as their…

  9. Content Evaluation of Textual CD-ROM and Web Databases. Database Searching Series.

    ERIC Educational Resources Information Center

    Jacso, Peter

    This book provides guidelines for evaluating a variety of database types, including abstracting and indexing, directory, full-text, and page-image databases available in online and/or CD-ROM formats. The book discusses the purpose and techniques of comparing and evaluating the most important characteristics of textual databases, such as their…

  10. Techniques for searching the CINAHL database using the EBSCO interface.

    PubMed

    Lawrence, Janna C

    2007-04-01

    The cumulative index to Nursing and Allied Health Literature (CINAHL) is a useful research tool for accessing articles of interest to nurses and health care professionals. More than 2,800 journals are indexed by CINAHL and can be searched easily using assigned subject headings. Detailed instructions about conducting, combining, and saving searches in CINAHL are provided in this article. Establishing an account at EBSCO further allows a nurse to save references and searches and to receive e-mail alerts when new articles on a topic of interest are published.

  11. Algorithms for database-dependent search of MS/MS data.

    PubMed

    Matthiesen, Rune

    2013-01-01

    The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.

  12. Federated or cached searches: Providing expected performance from multiple invasive species databases

    NASA Astrophysics Data System (ADS)

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-06-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search "deep" web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  13. Federated or cached searches: providing expected performance from multiple invasive species databases

    USGS Publications Warehouse

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-01-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search “deep” web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  14. Use of Composite Protein Database including Search Result Sequences for Mass Spectrometric Analysis of Cell Secretome

    PubMed Central

    Shin, Jihye; Kim, Gamin; Kabir, Mohammad Humayun; Park, Seong Jun; Lee, Seoung Taek; Lee, Cheolju

    2015-01-01

    Mass spectrometric (MS) data of human cell secretomes are usually run through the conventional human database for identification. However, the search may result in false identifications due to contamination of the secretome with fetal bovine serum (FBS) proteins. To overcome this challenge, here we provide a composite protein database including human as well as 199 FBS protein sequences for MS data search of human cell secretomes. Searching against the human-FBS database returned more reliable results with fewer false-positive and false-negative identifications compared to using either a human only database or a human-bovine database. Furthermore, the improved results validated our strategy without complex experiments like SILAC. We expect our strategy to improve the accuracy of human secreted protein identification and to also add value for general use. PMID:25822838

  15. Interspecies extrapolation based on the RepDose database--a probabilistic approach.

    PubMed

    Escher, Sylvia E; Batke, Monika; Hoffmann-Doerr, Simone; Messinger, Horst; Mangelsdorf, Inge

    2013-04-12

    Repeated dose toxicity studies from the RepDose database (DB) were used to determine interspecies differences for rats and mice. NOEL (no observed effect level) ratios based on systemic effects were investigated for three different types of exposure: inhalation, oral food/drinking water and oral gavage. Furthermore, NOEL ratios for local effects in inhalation studies were evaluated. On the basis of the NOEL ratio distributions, interspecies assessment factors (AF) are evaluated. All data sets were best described by a lognormal distribution. No difference was seen between inhalation and oral exposure for systemic effects. Rats and mice were on average equally sensitive at equipotent doses with geometric mean (GM) values of 1 and geometric standard deviation (GSD) values ranging from 2.30 to 3.08. The local AF based on inhalation exposure resulted in a similar distribution with GM values of 1 and GSD values between 2.53 and 2.70. Our analysis confirms former analyses on interspecies differences, including also dog and human data. Furthermore it supports the principle of allometric scaling according to caloric demand in the case that body doses are applied. In conclusion, an interspecies distribution animal/human with a GM equal to allometric scaling and a GSD of 2.5 was derived.

  16. Using the Turning Research Into Practice (TRIP) database: how do clinicians really search?*

    PubMed Central

    Meats, Emma; Brassey, Jon; Heneghan, Carl; Glasziou, Paul

    2007-01-01

    Objectives: Clinicians and patients are increasingly accessing information through Internet searches. This study aimed to examine clinicians' current search behavior when using the Turning Research Into Practice (TRIP) database to examine search engine use and the ways it might be improved. Methods: A Web log analysis was undertaken of the TRIP database—a meta-search engine covering 150 health resources including MEDLINE, The Cochrane Library, and a variety of guidelines. The connectors for terms used in searches were studied, and observations were made of 9 users' search behavior when working with the TRIP database. Results: Of 620,735 searches, most used a single term, and 12% (n = 75,947) used a Boolean operator: 11% (n = 69,006) used “AND” and 0.8% (n = 4,941) used “OR.” Of the elements of a well-structured clinical question (population, intervention, comparator, and outcome), the population was most commonly used, while fewer searches included the intervention. Comparator and outcome were rarely used. Participants in the observational study were interested in learning how to formulate better searches. Conclusions: Web log analysis showed most searches used a single term and no Boolean operators. Observational study revealed users were interested in conducting efficient searches but did not always know how. Therefore, either better training or better search interfaces are required to assist users and enable more effective searching. PMID:17443248

  17. STEPS: A Grid Search Methodology for Optimized Peptide Identification Filtering of MS/MS Database Search Results

    SciTech Connect

    Piehowski, Paul D.; Petyuk, Vladislav A.; Sandoval, John D.; Burnum, Kristin E.; Kiebel, Gary R.; Monroe, Matthew E.; Anderson, Gordon A.; Camp, David G.; Smith, Richard D.

    2013-03-01

    For bottom-up proteomics there are a wide variety of database searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection - referred to as STEPS - utilizes user-defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal "parameter set" for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true positive identifications are demonstrated using datasets derived from immunoaffinity-depleted blood serum and a bacterial cell lysate, two common proteomics sample types.

  18. Increasing number of databases searched in systematic reviews and meta-analyses between 1994 and 2014.

    PubMed

    Lam, Michael T; McDiarmid, Mary

    2016-10-01

    The purpose of this study was to determine whether the number of bibliographic databases used to search the health sciences literature in individual systematic reviews (SRs) and meta-analyses (MAs) changed over a twenty-year period related to the official 1995 launch of the Cochrane Database of Systematic Reviews (CDSR). Ovid MEDLINE was searched using a modified version of a strategy developed by the Scottish Intercollegiate Guidelines Network to identify SRs and MAs. Records from 3 milestone years were searched: the year immediately preceding (1994) and 1 (2004) and 2 (2014) decades following the CDSR launch. Records were sorted with randomization software. Abstracts or full texts of the records were examined to identify database usage until 100 relevant records were identified from each of the 3 years. The mean and median number of bibliographic databases searched in 1994, 2004, and 2014 were 1.62 and 1, 3.34 and 3, and 3.73 and 4, respectively. Studies that searched only 1 database decreased over the 3 milestone years (60% in 1994, 28% in 2004, and 10% in 2014). The number of bibliographic databases searched in individual SRs and MAs increased from 1994 to 2014.

  19. Increasing number of databases searched in systematic reviews and meta-analyses between 1994 and 2014

    PubMed Central

    Lam, Michael T.; McDiarmid, Mary

    2016-01-01

    Objectives The purpose of this study was to determine whether the number of bibliographic databases used to search the health sciences literature in individual systematic reviews (SRs) and meta-analyses (MAs) changed over a twenty-year period related to the official 1995 launch of the Cochrane Database of Systematic Reviews (CDSR). Methods Ovid MEDLINE was searched using a modified version of a strategy developed by the Scottish Intercollegiate Guidelines Network to identify SRs and MAs. Records from 3 milestone years were searched: the year immediately preceding (1994) and 1 (2004) and 2 (2014) decades following the CDSR launch. Records were sorted with randomization software. Abstracts or full texts of the records were examined to identify database usage until 100 relevant records were identified from each of the 3 years. Results The mean and median number of bibliographic databases searched in 1994, 2004, and 2014 were 1.62 and 1, 3.34 and 3, and 3.73 and 4, respectively. Studies that searched only 1 database decreased over the 3 milestone years (60% in 1994, 28% in 2004, and 10% in 2014). Conclusions The number of bibliographic databases searched in individual SRs and MAs increased from 1994 to 2014. PMID:27822149

  20. Social Work Literature Searching: Current Issues with Databases and Online Search Engines

    ERIC Educational Resources Information Center

    McGinn, Tony; Taylor, Brian; McColgan, Mary; McQuilkan, Janice

    2016-01-01

    Objectives: To compare the performance of a range of search facilities; and to illustrate the execution of a comprehensive literature search for qualitative evidence in social work. Context: Developments in literature search methods and comparisons of search facilities help facilitate access to the best available evidence for social workers.…

  1. Social Work Literature Searching: Current Issues with Databases and Online Search Engines

    ERIC Educational Resources Information Center

    McGinn, Tony; Taylor, Brian; McColgan, Mary; McQuilkan, Janice

    2016-01-01

    Objectives: To compare the performance of a range of search facilities; and to illustrate the execution of a comprehensive literature search for qualitative evidence in social work. Context: Developments in literature search methods and comparisons of search facilities help facilitate access to the best available evidence for social workers.…

  2. SledgeHMMER: a web server for batch searching the Pfam database.

    PubMed

    Chukkapalli, Giridhar; Guda, Chittibabu; Subramaniam, Shankar

    2004-07-01

    The SledgeHMMER web server is intended for genome-scale searching of the Pfam database without having to install this database and the HMMER software locally. The server implements a parallelized version of hmmpfam, the program used for searching the Pfam HMM database. Pfam search results have been calculated for the entire Swiss-Prot and TrEmbl database sequences (approximately 1.2 million) on 256 processors of IA64-based teragrid machines. The Pfam database can be searched in local, glocal or merged mode, using either gathering or E-value thresholds. Query sequences are first matched against the pre-calculated entries to retrieve results, and those without matches are processed through a new search process. Results are emailed in a space-delimited tabular format upon completion of the search. While most other Pfam-searching web servers set a limit of one sequence per query, this server processes batch sequences with no limit on the number of input sequences. The web server and downloadable data are accessible from http://SledgeHmmer.sdsc.edu.

  3. Dialog's Knowledge Index and BRS/After Dark: Database Searching on Personal Computers.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1983-01-01

    Describes two new bibliographic information services being marketed to microcomputer owners by DIALOG, Inc. and Bibliographic Retrieval Services to allow access to databases at low rates during evening hours. Subject focus, selection of a database, search strategies employed on each system are discussed, and the two services are compared. (EJS)

  4. A Practical Introduction to Non-Bibliographic Database Searching.

    ERIC Educational Resources Information Center

    Rocke, Hans J.; And Others

    This guide comprises four reports on the Laboratory Animal Data Bank (LADB), the National Institute of Health Environmental Protection Agency (NIH/EPA) Chemical Information System (CIS), nonbibliographic databases for the social sciences, and the Toxicology Data Bank (TDB) and Registry of Toxic Effects of Chemical Substances (RTECS). The first…

  5. Searching for Controlled Trials of Complementary and Alternative Medicine: A Comparison of 15 Databases

    PubMed Central

    Cogo, Elise; Sampson, Margaret; Ajiferuke, Isola; Manheimer, Eric; Campbell, Kaitryn; Daniel, Raymond; Moher, David

    2011-01-01

    This project aims to assess the utility of bibliographic databases beyond the three major ones (MEDLINE, EMBASE and Cochrane CENTRAL) for finding controlled trials of complementary and alternative medicine (CAM). Fifteen databases were searched to identify controlled clinical trials (CCTs) of CAM not also indexed in MEDLINE. Searches were conducted in May 2006 using the revised Cochrane highly sensitive search strategy (HSSS) and the PubMed CAM Subset. Yield of CAM trials per 100 records was determined, and databases were compared over a standardized period (2005). The Acudoc2 RCT, Acubriefs, Index to Chiropractic Literature (ICL) and Hom-Inform databases had the highest concentrations of non-MEDLINE records, with more than 100 non-MEDLINE records per 500. Other productive databases had ratios between 500 and 1500 records to 100 non-MEDLINE records—these were AMED, MANTIS, PsycINFO, CINAHL, Global Health and Alt HealthWatch. Five databases were found to be unproductive: AGRICOLA, CAIRSS, Datadiwan, Herb Research Foundation and IBIDS. Acudoc2 RCT yielded 100 CAM trials in the most recent 100 records screened. Acubriefs, AMED, Hom-Inform, MANTIS, PsycINFO and CINAHL had more than 25 CAM trials per 100 records screened. Global Health, ICL and Alt HealthWatch were below 25 in yield. There were 255 non-MEDLINE trials from eight databases in 2005, with only 10% indexed in more than one database. Yield varied greatly between databases; the most productive databases from both sampling methods were Acubriefs, Acudoc2 RCT, AMED and CINAHL. Low overlap between databases indicates comprehensive CAM literature searches will require multiple databases. PMID:19468052

  6. Extending the Role of the Corporate Library: Corporate Database Applications Using BRS/Search Software.

    ERIC Educational Resources Information Center

    Lammert, Diana

    1993-01-01

    Describes the McKenna Information Center's application of BRS/SEARCH, information retrieval software, as part of its services to Kennmetal Inc., its parent company. Features and uses of the software, including commands, custom searching, menu-driven interfaces, preparing reports, and designing databases are covered. Nine examples of software…

  7. A curriculum database with boolean natural-language searching in HyperCard.

    PubMed Central

    Mann, D.; Goodrum, K.; DeWine, J. M.; McVicker, J.

    1992-01-01

    A curriculum database including both natural-language and keyword searching was developed to assist faculty in curriculum research and reform. HyperCard (with extensions) on the Apple Macintosh provides a flexible single-user or networked environment for entering, indexing, searching and retrieving content in detailed faculty notes for the instructional activities in a four-year predoctoral curriculum. PMID:1482977

  8. InfoTrac's SearchBank Databases: Business Information and More.

    ERIC Educational Resources Information Center

    Mehta, Usha; Goodman, Beth

    1997-01-01

    Describes the InfoTrac SearchBank based on experiences at the University of Nevada, Reno, libraries where the service is available through the online catalog. Highlights include remote access through the Internet; indexing and abstracting; full-text access to 460 journal titles; a powerful search engine; and business-oriented databases.…

  9. When is a search not a search? A comparison of searching the AMED complementary health database via EBSCOhost, OVID and DIALOG.

    PubMed

    Younger, Paula; Boddy, Kate

    2009-06-01

    The researchers involved in this study work at Exeter Health library and at the Complementary Medicine Unit, Peninsula School of Medicine and Dentistry (PCMD). Within this collaborative environment it is possible to access the electronic resources of three institutions. This includes access to AMED and other databases using different interfaces. The aim of this study was to investigate whether searching different interfaces to the AMED allied health and complementary medicine database produced the same results when using identical search terms. The following Internet-based AMED interfaces were searched: DIALOG DataStar; EBSCOhost and OVID SP_UI01.00.02. Search results from all three databases were saved in an endnote database to facilitate analysis. A checklist was also compiled comparing interface features. In our initial search, DIALOG returned 29 hits, OVID 14 and Ebsco 8. If we assume that DIALOG returned 100% of potential hits, OVID initially returned only 48% of hits and EBSCOhost only 28%. In our search, a researcher using the Ebsco interface to carry out a simple search on AMED would miss over 70% of possible search hits. Subsequent EBSCOhost searches on different subjects failed to find between 21 and 86% of the hits retrieved using the same keywords via DIALOG DataStar. In two cases, the simple EBSCOhost search failed to find any of the results found via DIALOG DataStar. Depending on the interface, the number of hits retrieved from the same database with the same simple search can vary dramatically. Some simple searches fail to retrieve a substantial percentage of citations. This may result in an uninformed literature review, research funding application or treatment intervention. In addition to ensuring that keywords, spelling and medical subject headings (MeSH) accurately reflect the nature of the search, database users should include wildcards and truncation and adapt their search strategy substantially to retrieve the maximum number of appropriate

  10. New showers from parent body search across several video meteor databases

    NASA Astrophysics Data System (ADS)

    Šegon, Damir; Gural, Peter; Andreić, Željko; Skokić, Ivica; Korlević, Korado; Vida, Denis; Novoselnik, Filip

    2014-04-01

    This work was initiated by utilizing the latest complete set of both comets and NEOs downloaded from the JPL small-body database search engine. Rather than search for clustering within a single given database of meteor orbits, the method employed herein is to use all the known parent bodies with their individual orbital elements as the starting point, and find statistically significant associations across a variety of meteor databases cmns3-sonotaco,cmns3-cmnsdb. Fifteen new showers possibly related to a comet or a NEO were found.

  11. Conducting literature searches on Ayurveda in PubMed, Indian, and other databases.

    PubMed

    Narahari, Saravu R; Aggithaya, Madhur Guruprasad; Suraj, Kumbla R

    2010-11-01

    Literature searches for articles on Ayurveda provide special challenges, since many of the Indian journals in which such articles appear are not indexed by current medical databases such as PubMed and Cochrane Central Register of Controlled Trials. The aim of this study was to develop a comprehensive search strategy on Ayurveda topics and to map the existing databases containing Ayurveda journal publications. We have developed a literature search procedure that can recover the great majority of articles on any given topic associated with Ayurveda. Our system is formulated in an easily reproducible fashion that all researchers can use. Using the keywords related to Ayurveda and vitiligo, we searched 41 databases that may contain complementary and alternative medicine publications. Only 11 databases yielded results; PubMed contained 9 articles. Each of 14 other databases named in our search procedure averaged 23 articles. International Bibliographic Information of Dietary Supplements, for example, gave 22, of which 1 satisfied our eligibility criteria. "Annotated Bibliography of Indian Medicine" gave 47, of which 7 satisfied eligibility criteria. This article proposes guidelines enabling comprehensive searches to locate all types of Ayurvedic articles, not necessarily only randomized controlled trials.

  12. muBLASTP: database-indexed protein sequence search on multicore CPUs.

    PubMed

    Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun

    2016-11-04

    The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.

  13. Efficiency of 22 online databases in the search for physicochemical, toxicological and ecotoxicological information on chemicals.

    PubMed

    Guerbet, Michel; Guyodo, Gaetan

    2002-03-01

    The objective of this study was to evaluate the efficiency of 22 free online databases that could be used for an exhaustive search of physicochemical, toxicological and/or ecotoxicological information about various chemicals. Twenty-two databases with free access on the Internet were referenced. We then selected 27 major physicochemical, toxicological and ecotoxicological criteria and 14 compounds belonging to seven different chemical classes which were used to interrogate all the databases. Two indices were successively calculated to evaluate the efficiency with taking or not taking account of their specialization. More than 50% of the 22 databases 'knew' all of the 14 chemicals, but the quantity of information provided is very different from one to the other and most are poorly documented. Two categories clearly appear with specialized and non-specialized databases. The HSDB database is the most efficient general database to be searched first, because it is well documented for most of the 27 criteria. However, some specialized databases (i.e. EXTOXNET, SOLVEDB, etc.) must be searched secondarily to find additional information.

  14. Shape based indexing for faster search of RNA family databases.

    PubMed

    Janssen, Stefan; Reeder, Jens; Giegerich, Robert

    2008-02-29

    Most non-coding RNA families exert their function by means of a conserved, common secondary structure. The Rfam data base contains more than five hundred structurally annotated RNA families. Unfortunately, searching for new family members using covariance models (CMs) is very time consuming. Filtering approaches that use the sequence conservation to reduce the number of CM searches, are fast, but it is unknown to which sacrifice. We present a new filtering approach, which exploits the family specific secondary structure and significantly reduces the number of CM searches. The filter eliminates approximately 85% of the queries and discards only 2.6% true positives when evaluating Rfam against itself. First results also capture previously undetected non-coding RNAs in a recent human RNAz screen. The RNA shape index filter (RNAsifter) is based on the following rationale: An RNA family is characterised by structure, much more succinctly than by sequence content. Structures of individual family members, which naturally have different length and sequence composition, may exhibit structural variation in detail, but overall, they have a common shape in a more abstract sense. Given a fixed release of the Rfam data base, we can compute these abstract shapes for all families. This is called a shape index. If a query sequence belongs to a certain family, it must be able to fold into the family shape with reasonable free energy. Therefore, rather than matching the query against all families in the data base, we can first (and quickly) compute its feasible shape(s), and use the shape index to access only those families where a good match is possible due to a common shape with the query.

  15. A searching and reporting system for relational databases using a graph-based metadata representation.

    PubMed

    Hewitt, Robin; Gobbi, Alberto; Lee, Man-Ling

    2005-01-01

    Relational databases are the current standard for storing and retrieving data in the pharmaceutical and biotech industries. However, retrieving data from a relational database requires specialized knowledge of the database schema and of the SQL query language. At Anadys, we have developed an easy-to-use system for searching and reporting data in a relational database to support our drug discovery project teams. This system is fast and flexible and allows users to access all data without having to write SQL queries. This paper presents the hierarchical, graph-based metadata representation and SQL-construction methods that, together, are the basis of this system's capabilities.

  16. No suitable precise or optimized epidemiologic search filters were available for bibliographic databases.

    PubMed

    Waffenschmidt, Siw; Hermanns, Tatjana; Gerber-Grote, Andreas; Mostardt, Sarah

    2017-02-01

    To determine a suitable approach to a systematic search for epidemiologic publications in bibliographic databases. For this purpose, suitable sensitive, precise, and optimized filters were to be selected for MEDLINE searches. In addition, the relevance of bibliographic databases was determined. Epidemiologic systematic reviews (SRs) retrieved in a systematic search and company dossiers were screened to identify epidemiologic publications (primary studies and SRs) published since 2007. These publications were used to generate a test and validation set. Furthermore, each SR's search strategy was reviewed, and epidemiologic filters were extracted. The search syntaxes were validated using the relative recall method. The test set comprises 729 relevant epidemiologic publications, of which 566 were MEDLINE-indexed. About 27 epidemiologic filters were extracted. One suitable sensitive filter was identified (Larney et al. 2013: 95.94% sensitivity). Precision was presumably underestimated so that no precise or optimized filters can be recommended. About 77.64% of the publications were found in MEDLINE. There is currently no suitable approach to conducting efficient systematic searches for epidemiologic publications in bibliographic databases. The filter by Larney et al. (2013) can be used for sensitive MEDLINE searches. No robust conclusions can be drawn on precise or optimized filters. Additional search approaches should be considered. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  17. MIDAS: a database-searching algorithm for metabolite identification in metabolomics.

    PubMed

    Wang, Yingfeng; Kora, Guruprasad; Bowen, Benjamin P; Pan, Chongle

    2014-10-07

    A database searching approach can be used for metabolite identification in metabolomics by matching measured tandem mass spectra (MS/MS) against the predicted fragments of metabolites in a database. Here, we present the open-source MIDAS algorithm (Metabolite Identification via Database Searching). To evaluate a metabolite-spectrum match (MSM), MIDAS first enumerates possible fragments from a metabolite by systematic bond dissociation, then calculates the plausibility of the fragments based on their fragmentation pathways, and finally scores the MSM to assess how well the experimental MS/MS spectrum from collision-induced dissociation (CID) is explained by the metabolite's predicted CID MS/MS spectrum. MIDAS was designed to search high-resolution tandem mass spectra acquired on time-of-flight or Orbitrap mass spectrometer against a metabolite database in an automated and high-throughput manner. The accuracy of metabolite identification by MIDAS was benchmarked using four sets of standard tandem mass spectra from MassBank. On average, for 77% of original spectra and 84% of composite spectra, MIDAS correctly ranked the true compounds as the first MSMs out of all MetaCyc metabolites as decoys. MIDAS correctly identified 46% more original spectra and 59% more composite spectra at the first MSMs than an existing database-searching algorithm, MetFrag. MIDAS was showcased by searching a published real-world measurement of a metabolome from Synechococcus sp. PCC 7002 against the MetaCyc metabolite database. MIDAS identified many metabolites missed in the previous study. MIDAS identifications should be considered only as candidate metabolites, which need to be confirmed using standard compounds. To facilitate manual validation, MIDAS provides annotated spectra for MSMs and labels observed mass spectral peaks with predicted fragments. The database searching and manual validation can be performed online at http://midas.omicsbio.org.

  18. Global search tool for the Advanced Photon Source Integrated Relational Model of Installed Systems (IRMIS) database.

    SciTech Connect

    Quock, D. E. R.; Cianciarulo, M. B.; APS Engineering Support Division; Purdue Univ.

    2007-01-01

    The Integrated Relational Model of Installed Systems (IRMIS) is a relational database tool that has been implemented at the Advanced Photon Source to maintain an updated account of approximately 600 control system software applications, 400,000 process variables, and 30,000 control system hardware components. To effectively display this large amount of control system information to operators and engineers, IRMIS was initially built with nine Web-based viewers: Applications Organizing Index, IOC, PLC, Component Type, Installed Components, Network, Controls Spares, Process Variables, and Cables. However, since each viewer is designed to provide details from only one major category of the control system, the necessity for a one-stop global search tool for the entire database became apparent. The user requirements for extremely fast database search time and ease of navigation through search results led to the choice of Asynchronous JavaScript and XML (AJAX) technology in the implementation of the IRMIS global search tool. Unique features of the global search tool include a two-tier level of displayed search results, and a database data integrity validation and reporting mechanism.

  19. CLIP: similarity searching of 3D databases using clique detection.

    PubMed

    Rhodes, Nicholas; Willett, Peter; Calvet, Alain; Dunbar, James B; Humblet, Christine

    2003-01-01

    This paper describes a program for 3D similarity searching, called CLIP (for Candidate Ligand Identification Program), that uses the Bron-Kerbosch clique detection algorithm to find those structures in a file that have large structures in common with a target structure. Structures are characterized by the geometric arrangement of pharmacophore points and the similarity between two structures calculated using modifications of the Simpson and Tanimoto association coefficients. This modification takes into account the fact that a distance tolerance is required to ensure that pairs of interatomic distances can be regarded as equivalent during the clique-construction stage of the matching algorithm. Experiments with HIV assay data demonstrate the effectiveness and the efficiency of this approach to virtual screening.

  20. Analysis of Searches by End-Users of Science and Engineering CD-ROM Databases in an Academic Library.

    ERIC Educational Resources Information Center

    Culbertson, Michael

    1992-01-01

    This study analyzed CD-ROM searches in five science and engineering databases in an academic library. Results indicated that users were usually able to obtain results and print records but that few used more sophisticated techniques to refine their searches. It was concluded that instruction in CD-ROM database searching should be a high priority.…

  1. Database search for safety information on cosmetic ingredients.

    PubMed

    Pauwels, Marleen; Rogiers, Vera

    2007-12-01

    Ethical considerations with respect to experimental animal use and regulatory testing are worldwide under heavy discussion and are, in certain cases, taken up in legislative measures. The most explicit example is the European cosmetic legislation, establishing a testing ban on finished cosmetic products since 11 September 2004 and enforcing that the safety of a cosmetic product is assessed by taking into consideration "the general toxicological profile of the ingredients, their chemical structure and their level of exposure" (OJ L151, 32-37, 23 June 1993; OJ L066, 26-35, 11 March 2003). Therefore the availability of referenced and reliable information on cosmetic ingredients becomes a dire necessity. Given the high-speed progress of the World Wide Web services and the concurrent drastic increase in free access to information, identification of relevant data sources and evaluation of the scientific value and quality of the retrieved data, are crucial. Based upon own practical experience, a survey is put together of freely and commercially available data sources with their individual description, field of application, benefits and drawbacks. It should be mentioned that the search strategies described are equally useful as a starting point for any quest for safety data on chemicals or chemical-related substances in general.

  2. The LAILAPS search engine: a feature model for relevance ranking in life science databases.

    PubMed

    Lange, Matthias; Spies, Karl; Colmsee, Christian; Flemming, Steffen; Klapperstück, Matthias; Scholz, Uwe

    2010-03-25

    Efficient and effective information retrieval in life sciences is one of the most pressing challenge in bioinformatics. The incredible growth of life science databases to a vast network of interconnected information systems is to the same extent a big challenge and a great chance for life science research. The knowledge found in the Web, in particular in life-science databases, are a valuable major resource. In order to bring it to the scientist desktop, it is essential to have well performing search engines. Thereby, not the response time nor the number of results is important. The most crucial factor for millions of query results is the relevance ranking. In this paper, we present a feature model for relevance ranking in life science databases and its implementation in the LAILAPS search engine. Motivated by the observation of user behavior during their inspection of search engine result, we condensed a set of 9 relevance discriminating features. These features are intuitively used by scientists, who briefly screen database entries for potential relevance. The features are both sufficient to estimate the potential relevance, and efficiently quantifiable. The derivation of a relevance prediction function that computes the relevance from this features constitutes a regression problem. To solve this problem, we used artificial neural networks that have been trained with a reference set of relevant database entries for 19 protein queries. Supporting a flexible text index and a simple data import format, this concepts are implemented in the LAILAPS search engine. It can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. LAILAPS is publicly available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.

  3. Development and Validation of Search Filters to Identify Articles on Family Medicine in Online Medical Databases.

    PubMed

    Pols, David H J; Bramer, Wichor M; Bindels, Patrick J E; van de Laar, Floris A; Bohnen, Arthur M

    2015-01-01

    Physicians and researchers in the field of family medicine often need to find relevant articles in online medical databases for a variety of reasons. Because a search filter may help improve the efficiency and quality of such searches, we aimed to develop and validate search filters to identify research studies of relevance to family medicine. Using a new and objective method for search filter development, we developed and validated 2 search filters for family medicine. The sensitive filter had a sensitivity of 96.8% and a specificity of 74.9%. The specific filter had a specificity of 97.4% and a sensitivity of 90.3%. Our new filters should aid literature searches in the family medicine field. The sensitive filter may help researchers conducting systematic reviews, whereas the specific filter may help family physicians find answers to clinical questions at the point of care when time is limited. © 2015 Annals of Family Medicine, Inc.

  4. Development and Validation of Search Filters to Identify Articles on Family Medicine in Online Medical Databases

    PubMed Central

    Pols, David H.J.; Bramer, Wichor M.; Bindels, Patrick J.E.; van de Laar, Floris A.; Bohnen, Arthur M.

    2015-01-01

    Physicians and researchers in the field of family medicine often need to find relevant articles in online medical databases for a variety of reasons. Because a search filter may help improve the efficiency and quality of such searches, we aimed to develop and validate search filters to identify research studies of relevance to family medicine. Using a new and objective method for search filter development, we developed and validated 2 search filters for family medicine. The sensitive filter had a sensitivity of 96.8% and a specificity of 74.9%. The specific filter had a specificity of 97.4% and a sensitivity of 90.3%. Our new filters should aid literature searches in the family medicine field. The sensitive filter may help researchers conducting systematic reviews, whereas the specific filter may help family physicians find answers to clinical questions at the point of care when time is limited. PMID:26195683

  5. Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches.

    PubMed

    Yu, Yi-Kuo; Gertz, E Michael; Agarwala, Richa; Schäffer, Alejandro A; Altschul, Stephen F

    2006-01-01

    Protein sequence database search programs may be evaluated both for their retrieval accuracy--the ability to separate meaningful from chance similarities--and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.

  6. Using homology relations within a database markedly boosts protein sequence similarity search.

    PubMed

    Tong, Jing; Sadreyev, Ruslan I; Pei, Jimin; Kinch, Lisa N; Grishin, Nick V

    2015-06-02

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

  7. Using homology relations within a database markedly boosts protein sequence similarity search

    PubMed Central

    Tong, Jing; Sadreyev, Ruslan I.; Pei, Jimin; Kinch, Lisa N.; Grishin, Nick V.

    2015-01-01

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence–based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit’s known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre. PMID:26038555

  8. An alphabetic code based atomic level molecular similarity search in databases.

    PubMed

    Saranya, Nallusamy; Selvaraj, Samuel

    2012-01-01

    Atomic level molecular similarity and diversity studies have gained considerable importance through their wide application in Bioinformatics and Chemo-informatics for drug design. The availability of large volumes of data on chemical compounds requires new methodologies for efficient and effective searching of its archives in less time with optimal computational power. We describe an alphabetic algorithm for similarity searching based on atom-atom bonding preference for ligands. We represented 170 cyclindependent kinase 2 inhibitors using strings of pre-defined alphabets for searching using known protein sequence alignment tools. Thus, a common pattern was extracted using this set of compounds for database searching to retrieve similar active compounds. Area under the receiver operating characteristic (ROC) curve was used for the discrimination of similar and dissimilar compounds in the databases. An average retrieval rate of about 60% is obtained in cross-validation using the home-grown dataset and the directory of useful decoys (DUD, formally known as the ZINC database) data. This will help in the effective retrieval of similar compounds using database search.

  9. Vehicle-triggered video compression/decompression for fast and efficient searching in large video databases

    NASA Astrophysics Data System (ADS)

    Bulan, Orhan; Bernal, Edgar A.; Loce, Robert P.; Wu, Wencheng

    2013-03-01

    Video cameras are widely deployed along city streets, interstate highways, traffic lights, stop signs and toll booths by entities that perform traffic monitoring and law enforcement. The videos captured by these cameras are typically compressed and stored in large databases. Performing a rapid search for a specific vehicle within a large database of compressed videos is often required and can be a time-critical life or death situation. In this paper, we propose video compression and decompression algorithms that enable fast and efficient vehicle or, more generally, event searches in large video databases. The proposed algorithm selects reference frames (i.e., I-frames) based on a vehicle having been detected at a specified position within the scene being monitored while compressing a video sequence. A search for a specific vehicle in the compressed video stream is performed across the reference frames only, which does not require decompression of the full video sequence as in traditional search algorithms. Our experimental results on videos captured in a local road show that the proposed algorithm significantly reduces the search space (thus reducing time and computational resources) in vehicle search tasks within compressed video streams, particularly those captured in light traffic volume conditions.

  10. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search

    PubMed Central

    Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile

    2015-01-01

    In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result–the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4–5 times faster than SSEARCH, 6–25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases PMID:26719890

  11. An Interactive Iterative Method for Electronic Searching of Large Literature Databases

    ERIC Educational Resources Information Center

    Hernandez, Marco A.

    2013-01-01

    PubMed® is an on-line literature database hosted by the U.S. National Library of Medicine. Containing over 21 million citations for biomedical literature--both abstracts and full text--in the areas of the life sciences, behavioral studies, chemistry, and bioengineering, PubMed® represents an important tool for researchers. PubMed® searches return…

  12. Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.

    2009-05-06

    Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptide identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample

  13. Building the Infrastructure of Resource Sharing: Union Catalogs, Distributed Search, and Cross-Database Linkage.

    ERIC Educational Resources Information Center

    Lynch, Clifford A.

    1997-01-01

    Union catalogs and distributed search systems are two ways users can locate materials in print and electronic formats. This article examines the advantages and limitations of both approaches and argues that they should be considered complementary rather than competitive. Discusses technologies creating linkage between catalogs and databases and…

  14. Sports Information Online: Searching the SPORT Database and Tips for Finding Sports Medicine Information Online.

    ERIC Educational Resources Information Center

    Janke, Richard V.; And Others

    1988-01-01

    The first article describes SPORT, a database providing international coverage of athletics and physical education, and compares it to other online services in terms of coverage, thesauri, possible search strategies, and actual usage. The second article reviews available online information on sports medicine. (CLB)

  15. Sports Information Online: Searching the SPORT Database and Tips for Finding Sports Medicine Information Online.

    ERIC Educational Resources Information Center

    Janke, Richard V.; And Others

    1988-01-01

    The first article describes SPORT, a database providing international coverage of athletics and physical education, and compares it to other online services in terms of coverage, thesauri, possible search strategies, and actual usage. The second article reviews available online information on sports medicine. (CLB)

  16. Planning for End-User Database Searching: Drexel and the Mac: A User-Consistent Interface.

    ERIC Educational Resources Information Center

    LaBorie, Tim; Donnelly, Leslie

    Drexel University instituted a microcomputing program in 1984 which required all freshmen to own Apple Macintosh microcomputers. All students were taught database searching on the BRS (Bibliographic Retrieval Services) system as part of the freshman humanities curriculum, and the university library was chosen as the site to house continuing…

  17. Discovering More Chemical Concepts from 3D Chemical Information Searches of Crystal Structure Databases

    ERIC Educational Resources Information Center

    Rzepa, Henry S.

    2016-01-01

    Three new examples are presented illustrating three-dimensional chemical information searches of the Cambridge structure database (CSD) from which basic core concepts in organic and inorganic chemistry emerge. These include connecting the regiochemistry of aromatic electrophilic substitution with the geometrical properties of hydrogen bonding…

  18. Successful Keyword Searching: Initiating Research on Popular Topics Using Electronic Databases.

    ERIC Educational Resources Information Center

    MacDonald, Randall M.; MacDonald, Susan Priest

    Students are using electronic resources more than ever before to locate information for assignments. Without the proper search terms, results are incomplete, and students are frustrated. Using the keywords, key people, organizations, and Web sites provided in this book and compiled from the most commonly used databases, students will be able to…

  19. Toward a public analysis database for LHC new physics searches using M ADA NALYSIS 5

    NASA Astrophysics Data System (ADS)

    Dumont, B.; Fuks, B.; Kraml, S.; Bein, S.; Chalons, G.; Conte, E.; Kulkarni, S.; Sengupta, D.; Wymant, C.

    2015-02-01

    We present the implementation, in the MadAnalysis 5 framework, of several ATLAS and CMS searches for supersymmetry in data recorded during the first run of the LHC. We provide extensive details on the validation of our implementations and propose to create a public analysis database within this framework.

  20. An Interactive Iterative Method for Electronic Searching of Large Literature Databases

    ERIC Educational Resources Information Center

    Hernandez, Marco A.

    2013-01-01

    PubMed® is an on-line literature database hosted by the U.S. National Library of Medicine. Containing over 21 million citations for biomedical literature--both abstracts and full text--in the areas of the life sciences, behavioral studies, chemistry, and bioengineering, PubMed® represents an important tool for researchers. PubMed® searches return…

  1. Discovering More Chemical Concepts from 3D Chemical Information Searches of Crystal Structure Databases

    ERIC Educational Resources Information Center

    Rzepa, Henry S.

    2016-01-01

    Three new examples are presented illustrating three-dimensional chemical information searches of the Cambridge structure database (CSD) from which basic core concepts in organic and inorganic chemistry emerge. These include connecting the regiochemistry of aromatic electrophilic substitution with the geometrical properties of hydrogen bonding…

  2. Parallel database search and prime factorization with magnonic holographic memory devices

    NASA Astrophysics Data System (ADS)

    Khitun, Alexander

    2015-12-01

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  3. Parallel database search and prime factorization with magnonic holographic memory devices

    SciTech Connect

    Khitun, Alexander

    2015-12-28

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  4. Studying Gene Expression: Database Searches and Promoter Fusions to Investigate Transcriptional Regulation in Bacteria†

    PubMed Central

    Martinez-Vaz, Betsy M.; Makarevitch, Irina; Stensland, Shane

    2010-01-01

    A laboratory project was designed to illustrate how to search biological databases and utilize the information provided by these resources to investigate transcriptional regulation in Escherichia coli. The students searched several databases (NCBI Genomes, RegulonDB and EcoCyc) to learn about gene function, regulation, and the organization of transcriptional units. A fluorometer and GFP promoter fusions were used to obtain fluorescence data and measure changes in transcriptional activity. The class designed and performed experiments to investigate the regulation of genes necessary for biosynthesis of amino acids and how expression is affected by environmental signals and transcriptional regulators. Assessment data showed that this activity enhanced students’ knowledge of databases, reporter genes and transcriptional regulation. PMID:23653697

  5. Systematic Reviews and Meta-Analyses of Traditional Chinese Medicine Must Search Chinese Databases to Reduce Language Bias

    PubMed Central

    Wu, Xin-Yin; Tang, Jin-Ling; Mao, Chen; Yuan, Jin-Qiu; Qin, Ying; Chung, Vincent C. H.

    2013-01-01

    Systematic reviews (SRs) that fail to search non-English databases may miss relevant studies and cause selection bias. The bias may be particularly severe in SRs of traditional Chinese medicine (TCM) as most randomized controlled trials (RCT) in TCM are published and accessible only in Chinese. In this study we investigated how often Chinese databases were not searched in SRs of TCM, how many trials were missed, and whether a bias may occur if Chinese databases were not searched. We searched 5 databases in English and 3 in Chinese for RCTs of Chinese herbal medicine for coronary artery disease and found that 96.64% (115/119) eligible studies could be identified only from Chinese databases. In a random sample of 80 Cochrane reviews on TCM, we found that Chinese databases were only searched in 43 or 53.75%, in which almost all the included studies were identified from Chinese databases. We also compared SRs of the same topic and found that they may draw a different conclusion if Chinese databases were not searched. In conclusion, an overwhelmingly high percentage of eligible trials on TCM could only be identified in Chinese databases. Reviewers in TCM are suggested to search Chinese databases to reduce potential selection bias. PMID:24223063

  6. Systematic reviews and meta-analyses of traditional chinese medicine must search chinese databases to reduce language bias.

    PubMed

    Wu, Xin-Yin; Tang, Jin-Ling; Mao, Chen; Yuan, Jin-Qiu; Qin, Ying; Chung, Vincent C H

    2013-01-01

    Systematic reviews (SRs) that fail to search non-English databases may miss relevant studies and cause selection bias. The bias may be particularly severe in SRs of traditional Chinese medicine (TCM) as most randomized controlled trials (RCT) in TCM are published and accessible only in Chinese. In this study we investigated how often Chinese databases were not searched in SRs of TCM, how many trials were missed, and whether a bias may occur if Chinese databases were not searched. We searched 5 databases in English and 3 in Chinese for RCTs of Chinese herbal medicine for coronary artery disease and found that 96.64% (115/119) eligible studies could be identified only from Chinese databases. In a random sample of 80 Cochrane reviews on TCM, we found that Chinese databases were only searched in 43 or 53.75%, in which almost all the included studies were identified from Chinese databases. We also compared SRs of the same topic and found that they may draw a different conclusion if Chinese databases were not searched. In conclusion, an overwhelmingly high percentage of eligible trials on TCM could only be identified in Chinese databases. Reviewers in TCM are suggested to search Chinese databases to reduce potential selection bias.

  7. Searching fee and non-fee toxicology information resources: an overview of selected databases.

    PubMed

    Wright, L L

    2001-01-12

    Toxicology profiles organize information by broad subjects, the first of which affirms identity of the agent studied. Studies here show two non-fee databases (ChemFinder and ChemIDplus) verify the identity of compounds with high efficiency (63% and 73% respectively) with the fee-based Chemical Abstracts Registry file serving well to fill data gaps (100%). Continued searching proceeds using knowledge of structure, scope and content to select databases. Valuable sources for information are factual databases that collect data and facts in special subject areas organized in formats available for analysis or use. Some sources representative of factual files are RTECS, CCRIS, HSDB, GENE-TOX and IRIS. Numerous factual databases offer a wealth of reliable information; however, exhaustive searches probe information published in journal articles and/or technical reports with records residing in bibliographic databases such as BIOSIS, EMBASE, MEDLINE, TOXLINE and Web of Science. Listed with descriptions are numerous factual and bibliographic databases supplied by 11 producers. Given the multitude of options and resources, it is often necessary to seek service desk assistance. Questions were posed by telephone and e-mail to service desks at DIALOG, ISI, MEDLARS, Micromedex and STN International. Results of the survey are reported.

  8. Searching biomedical databases on complementary medicine: the use of controlled vocabulary among authors, indexers and investigators.

    PubMed

    Murphy, Linda S; Reinsch, Sibylle; Najm, Wadie I; Dickerson, Vivian M; Seffinger, Michael A; Adams, Alan; Mishra, Shiraz I

    2003-07-07

    The optimal retrieval of a literature search in biomedicine depends on the appropriate use of Medical Subject Headings (MeSH), descriptors and keywords among authors and indexers. We hypothesized that authors, investigators and indexers in four biomedical databases are not consistent in their use of terminology in Complementary and Alternative Medicine (CAM). Based on a research question addressing the validity of spinal palpation for the diagnosis of neuromuscular dysfunction, we developed four search concepts with their respective controlled vocabulary and key terms. We calculated the frequency of MeSH, descriptors, and keywords used by authors in titles and abstracts in comparison to standard practices in semantic and analytic indexing in MEDLINE, MANTIS, CINAHL, and Web of Science. Multiple searches resulted in the final selection of 38 relevant studies that were indexed at least in one of the four selected databases. Of the four search concepts, validity showed the greatest inconsistency in terminology among authors, indexers and investigators. The use of spinal terms showed the greatest consistency. Of the 22 neuromuscular dysfunction terms provided by the investigators, 11 were not contained in the controlled vocabulary and six were never used by authors or indexers. Most authors did not seem familiar with the controlled vocabulary for validity in the area of neuromuscular dysfunction. Recently, standard glossaries have been developed to assist in the research development of manual medicine. Searching biomedical databases for CAM is challenging due to inconsistent use of controlled vocabulary and indexing procedures in different databases. A standard terminology should be used by investigators in conducting their search strategies and authors when writing titles, abstracts and submitting keywords for publications.

  9. Searching biomedical databases on complementary medicine: the use of controlled vocabulary among authors, indexers and investigators

    PubMed Central

    Murphy, Linda S; Reinsch, Sibylle; Najm, Wadie I; Dickerson, Vivian M; Seffinger, Michael A; Adams, Alan; Mishra, Shiraz I

    2003-01-01

    Background The optimal retrieval of a literature search in biomedicine depends on the appropriate use of Medical Subject Headings (MeSH), descriptors and keywords among authors and indexers. We hypothesized that authors, investigators and indexers in four biomedical databases are not consistent in their use of terminology in Complementary and Alternative Medicine (CAM). Methods Based on a research question addressing the validity of spinal palpation for the diagnosis of neuromuscular dysfunction, we developed four search concepts with their respective controlled vocabulary and key terms. We calculated the frequency of MeSH, descriptors, and keywords used by authors in titles and abstracts in comparison to standard practices in semantic and analytic indexing in MEDLINE, MANTIS, CINAHL, and Web of Science. Results Multiple searches resulted in the final selection of 38 relevant studies that were indexed at least in one of the four selected databases. Of the four search concepts, validity showed the greatest inconsistency in terminology among authors, indexers and investigators. The use of spinal terms showed the greatest consistency. Of the 22 neuromuscular dysfunction terms provided by the investigators, 11 were not contained in the controlled vocabulary and six were never used by authors or indexers. Most authors did not seem familiar with the controlled vocabulary for validity in the area of neuromuscular dysfunction. Recently, standard glossaries have been developed to assist in the research development of manual medicine. Conclusions Searching biomedical databases for CAM is challenging due to inconsistent use of controlled vocabulary and indexing procedures in different databases. A standard terminology should be used by investigators in conducting their search strategies and authors when writing titles, abstracts and submitting keywords for publications. PMID:12846931

  10. Multimedia explorer: image database, image proxy-server and search-engine.

    PubMed Central

    Frankewitsch, T.; Prokosch, U.

    1999-01-01

    Multimedia plays a major role in medicine. Databases containing images, movies or other types of multimedia objects are increasing in number, especially on the WWW. However, no good retrieval mechanism or search engine currently exists to efficiently track down such multimedia sources in the vast of information provided by the WWW. Secondly, the tools for searching databases are usually not adapted to the properties of images. HTML pages do not allow complex searches. Therefore establishing a more comfortable retrieval involves the use of a higher programming level like JAVA. With this platform independent language it is possible to create extensions to commonly used web browsers. These applets offer a graphical user interface for high level navigation. We implemented a database using JAVA objects as the primary storage container which are then stored by a JAVA controlled ORACLE8 database. Navigation depends on a structured vocabulary enhanced by a semantic network. With this approach multimedia objects can be encapsulated within a logical module for quick data retrieval. PMID:10566463

  11. A Novel Concept for the Search and Retrieval of the Derwent Markush Resource Database.

    PubMed

    Barth, Andreas; Stengel, Thomas; Litterst, Edwin; Kraut, Hans; Matuszczyk, Henry; Ailer, Franz; Hajkowski, Steve

    2016-05-23

    The representation of and search for generic chemical structures (Markush) remains a continuing challenge. Several research groups have addressed this problem, and over time a limited number of practical solutions have been proposed. Today there are two large commercial providers of Markush databases: Chemical Abstracts Service (CAS) and Thomson Reuters. The Thomson Reuters "Derwent" Markush database is currently offered via the online services Questel and STN and as a data feed for in-house use. The aim of this paper is to briefly review the existing Markush systems (databases plus search engines) and to describe our new approach for the implementation of the Derwent Markush Resource on STN. Our new approach demonstrates the integration of the Derwent Markush Resource database into the existing chemistry-focused STN platform without loss of detail. This provides compatibility with other structure and Markush databases on STN and at the same time makes it possible to deploy the specific features and functions of the Derwent approach. It is shown that the different Markush languages developed by CAS and Derwent can be combined into a single general Markush description. In this concept the generic nodes are grouped together in a unique hierarchy where all chemical elements and fragments can be integrated. As a consequence, both systems are searchable using a single structure query. Moreover, the presented concept could serve as a promising starting point for a common generalized description of Markush structures.

  12. Improved Search of Principal Component Analysis Databases for Spectro-polarimetric Inversion

    NASA Astrophysics Data System (ADS)

    Casini, R.; Asensio Ramos, A.; Lites, B. W.; López Ariste, A.

    2013-08-01

    We describe a simple technique for the acceleration of spectro-polarimetric inversions based on principal component analysis (PCA) of Stokes profiles. This technique involves the indexing of the database models based on the sign of the projections (PCA coefficients) of the first few relevant orders of principal components of the four Stokes parameters. In this way, each model in the database can be attributed a distinctive binary number of 24n bits, where n is the number of PCA orders used for the indexing. Each of these binary numbers (indices) identifies a group of "compatible" models for the inversion of a given set of observed Stokes profiles sharing the same index. The complete set of the binary numbers so constructed evidently determines a partition of the database. The search of the database for the PCA inversion of spectro-polarimetric data can profit greatly from this indexing. In practical cases it becomes possible to approach the ideal acceleration factor of 24n as compared to the systematic search of a non-indexed database for a traditional PCA inversion. This indexing method relies on the existence of a physical meaning in the sign of the PCA coefficients of a model. For this reason, the presence of model ambiguities and of spectro-polarimetric noise in the observations limits in practice the number n of relevant PCA orders that can be used for the indexing.

  13. IMPROVED SEARCH OF PRINCIPAL COMPONENT ANALYSIS DATABASES FOR SPECTRO-POLARIMETRIC INVERSION

    SciTech Connect

    Casini, R.; Lites, B. W.; Ramos, A. Asensio

    2013-08-20

    We describe a simple technique for the acceleration of spectro-polarimetric inversions based on principal component analysis (PCA) of Stokes profiles. This technique involves the indexing of the database models based on the sign of the projections (PCA coefficients) of the first few relevant orders of principal components of the four Stokes parameters. In this way, each model in the database can be attributed a distinctive binary number of 2{sup 4n} bits, where n is the number of PCA orders used for the indexing. Each of these binary numbers (indices) identifies a group of ''compatible'' models for the inversion of a given set of observed Stokes profiles sharing the same index. The complete set of the binary numbers so constructed evidently determines a partition of the database. The search of the database for the PCA inversion of spectro-polarimetric data can profit greatly from this indexing. In practical cases it becomes possible to approach the ideal acceleration factor of 2{sup 4n} as compared to the systematic search of a non-indexed database for a traditional PCA inversion. This indexing method relies on the existence of a physical meaning in the sign of the PCA coefficients of a model. For this reason, the presence of model ambiguities and of spectro-polarimetric noise in the observations limits in practice the number n of relevant PCA orders that can be used for the indexing.

  14. Speeding up tandem mass spectrometry-based database searching by longest common prefix

    PubMed Central

    2010-01-01

    Background Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use. Results We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions. Conclusions The ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm PMID:21108792

  15. Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

    PubMed

    Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

  16. A checklist to assess database-hosting platforms for designing and running searches for systematic reviews.

    PubMed

    Bethel, Alison; Rogers, Morwenna

    2014-03-01

    Systematic reviews require literature searches that are precise, sensitive and often complex. Database-hosting platforms need to facilitate this type of searching in order to minimise errors and the risk of bias in the results. The main objective of the study was to create a generic checklist of criteria to assess the ability of host platforms to cope with complex searching, for example, for systematic reviews, and to test the checklist against three host platforms (EBSCOhost, OvidSP and ProQuest). The checklist was developed as usual review work was carried out and through discussion between the two authors. Attributes on the checklist were designated as 'desirable' or 'essential'. The authors tested the checklist independently against three host platforms and graded their performance from 1 (insufficient) to 3 (performs well). Fifty-five desirable or essential attributes were identified for the checklist. None of the platforms performed well for all of the attributes on the checklist. Not all database-hosting platforms are designed for complex searching. Librarians and other decision-makers who work in health research settings need to be aware of the different limitations of host platforms for complex searching when they are making purchasing decisions or training others. © 2014 The authors. Health Information and Libraries Journal © 2014 Health Libraries Group.

  17. Novel strategy for database searching in spin liouville space by NMR ensemble computing

    PubMed

    Bruschweiler

    2000-11-27

    Quantum computing by nuclear magnetic resonance using pseudopure spin states is bound by the maximal speed of quantum computing algorithms operating on pure states. In contrast to these quantum computing algorithms, a novel algorithm for searching an unsorted database is presented here that operates on truly mixed states in spin Liouville space. It provides an exponential speedup over Grover's quantum search algorithm with the sensitivity scaling exponentially with the number of spins, as for pseudopure state implementations. The minimal decoherence time required is exponentially shorter than that for Grover's algorithm.

  18. Searching molecular structure databases with tandem mass spectra using CSI:FingerID

    PubMed Central

    Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian

    2015-01-01

    Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin. PMID:26392543

  19. Discovery of novel mesangial cell proliferation inhibitors using a three-dimensional database searching method.

    PubMed

    Kurogi, Y; Miyata, K; Okamura, T; Hashimoto, K; Tsutsumi, K; Nasu, M; Moriyasu, M

    2001-07-05

    A three-dimensional pharmacophore model of mesangial cell (MC) proliferation inhibitors was generated from a training set of 4-(diethoxyphosphoryl)methyl-N-(3-phenyl-[1,2,4]thiadiazol-5-yl)benzamide, 2, and its derivatives using the Catalyst/HIPHOP software program. On the basis of the in vitro MC proliferation inhibitory activity, a pharmacophore model was generated as seven features consisting of two hydrophobic regions, two hydrophobic aromatic regions, and three hydrogen bond acceptors. Using this model as a three-dimensional query to search the Maybridge database, structurally novel 41 compounds were identified. The evaluation of MC proliferation inhibitory activity using available samples from the 41 identified compounds exhibited over 50% inhibitory activity at the 100 nM range. Interestingly, the newly identified compounds by the 3D database searching method exhibited the reduced inhibition of normal proximal tubular epithelial cell proliferation compared to a training set of compounds.

  20. Tempest: Accelerated MS/MS Database Search Software for Heterogeneous Computing Platforms.

    PubMed

    Adamo, Mark E; Gerber, Scott A

    2016-09-07

    MS/MS database search algorithms derive a set of candidate peptide sequences from in silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU (central processing unit) generates peptide candidates that are asynchronously sent to a discrete GPU (graphics processing unit) to be scored against experimental spectra in parallel. The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. © 2016 by John Wiley & Sons, Inc.

  1. BioSCAN: a network sharable computational resource for searching biosequence databases.

    PubMed

    Singh, R K; Hoffman, D L; Tell, S G; White, C T

    1996-06-01

    We describe a network sharable, interactive computational tool for rapid and sensitive search and analysis of biomolecular sequence databases such as GenBank, GenPept, Protein Identification Resource, and SWISS-PROT. The resource is accessible via the World Wide Web using popular client software such as Mosaic and Netscape. The client software is freely available on a number of computing platforms including Macintosh, IBM-PC, and Unix workstations.

  2. Improved classification of mass spectrometry database search results using newer machine learning approaches.

    PubMed

    Ulintz, Peter J; Zhu, Ji; Qin, Zhaohui S; Andrews, Philip C

    2006-03-01

    Manual analysis of mass spectrometry data is a current bottleneck in high throughput proteomics. In particular, the need to manually validate the results of mass spectrometry database searching algorithms can be prohibitively time-consuming. Development of software tools that attempt to quantify the confidence in the assignment of a protein or peptide identity to a mass spectrum is an area of active interest. We sought to extend work in this area by investigating the potential of recent machine learning algorithms to improve the accuracy of these approaches and as a flexible framework for accommodating new data features. Specifically we demonstrated the ability of boosting and random forest approaches to improve the discrimination of true hits from false positive identifications in the results of mass spectrometry database search engines compared with thresholding and other machine learning approaches. We accommodated additional attributes obtainable from database search results, including a factor addressing proton mobility. Performance was evaluated using publically available electrospray data and a new collection of MALDI data generated from purified human reference proteins.

  3. A hybrid approach for addressing ring flexibility in 3D database searching.

    PubMed

    Sadowski, J

    1997-01-01

    A hybrid approach for flexible 3D database searching is presented that addresses the problem of ring flexibility. It combines the explicit storage of up to 25 multiple conformations of rings, with up to eight atoms, generated by the 3D structure generator CORINA with the power of a torsional fitting technique implemented in the 3D database system UNITY. A comparison with the original UNITY approach, using a database with about 130,000 entries and five different pharmacophore queries, was performed. The hybrid approach scored, on an average, 10-20% more hits than the reference run. Moreover, specific problems with unrealistic hit geometries produced by the original approach can be excluded. In addition, the influence of the maximum number of ring conformations per molecule was investigated. An optimal number of 10 conformations per molecule is recommended.

  4. Fast multiresolution search algorithm for optimal retrieval in large multimedia databases

    NASA Astrophysics Data System (ADS)

    Song, Byung C.; Kim, Myung J.; Ra, Jong Beom

    1999-12-01

    Most of the content-based image retrieval systems require a distance computation for each candidate image in the database. As a brute-force approach, the exhaustive search can be employed for this computation. However, this exhaustive search is time-consuming and limits the usefulness of such systems. Thus, there is a growing demand for a fast algorithm which provides the same retrieval results as the exhaustive search. In this paper, we prose a fast search algorithm based on a multi-resolution data structure. The proposed algorithm computes the lower bound of distance at each level and compares it with the latest minimum distance, starting from the low-resolution level. Once it is larger than the latest minimum distance, we can exclude the candidates without calculating the full- resolution distance. By doing this, we can dramatically reduce the total computational complexity. It is noticeable that the proposed fast algorithm provides not only the same retrieval results as the exhaustive search, but also a faster searching ability than existing fast algorithms. For additional performance improvement, we can easily combine the proposed algorithm with existing tree-based algorithms. The algorithm can also be used for the fast matching of various features such as luminance histograms, edge images, and local binary partition textures.

  5. A grammar based methodology for structural motif finding in ncRNA database search.

    PubMed

    Quest, Daniel; Tapprich, William; Ali, Hesham

    2007-01-01

    In recent years, sequence database searching has been conducted through local alignment heuristics, pattern-matching, and comparison of short statistically significant patterns. While these approaches have unlocked many clues as to sequence relationships, they are limited in that they do not provide context-sensitive searching capabilities (e.g. considering pseudoknots, protein binding positions, and complementary base pairs). Stochastic grammars (hidden Markov models HMMs and stochastic context-free grammars SCFG) do allow for flexibility in terms of local context, but the context comes at the cost of increased computational complexity. In this paper we introduce a new grammar based method for searching for RNA motifs that exist within a conserved RNA structure. Our method constrains computational complexity by using a chain of topology elements. Through the use of a case study we present the algorithmic approach and benchmark our approach against traditional methods.

  6. Pharmacophore modeling and three-dimensional database searching for drug design using catalyst.

    PubMed

    Kurogi, Y; Güner, O F

    2001-07-01

    Perceiving a pharmacophore is the first essential step towards understanding the interaction between a receptor and a ligand. Once a pharmacophore is established, a beneficial use of it is 3D database searching to retrieve novel compounds that would match the pharmacophore, without necessarily duplicating the topological features of known active compounds (hence remain independent of existing patents). As the 3D searching technology has evolved over the years, it has been effectively used for lead optimization, combinatorial library focusing, as well as virtual high-throughput screening. Clearly established as one of the successful computational tools in rational drug design, we present in this review article a brief history of the evolution of this technology and detailed algorithms of Catalyst, the latest 3D searching software to be released. We also provide brief summary of published successes with this technology, including two recent patent applications.

  7. Remote Access MicroMeSH: Evaluation of a Microcomputer System for Searching the MEDLINE Database

    PubMed Central

    Lowe, Henry J.; Barnett, G. Octo; Scott, Jon; Mallon, Laurie; Blewett, Dyan Ryan

    1989-01-01

    Remote Access MicroMeSH (RAMM) is a powerful but easy to use microcomputer system for searching the MEDLINE database. RAMM incorporates MicroMeSH, a microcomputer implementation of the National Library of Medicine's (NLM) Medical Subject Headings (MeSH) vocabulary. RAMM facilitates the creation of highly specific MEDLINE search queries. Our goals in creating RAMM were to provide a system that could be used to search the medical literature and to teach the basic skills required to use MeSH and MEDLINE. During the past two years RAMM has been used by clinicians, library professionals, researchers and students at Harvard Medical School and at selected academic sites in the U.S. and Canada. In February of 1989 we began an effort to formally evaluate RAMM. This paper describes the preliminary results of that evaluation.

  8. Rapid identification of anonymous subjects in large criminal databases: problems and solutions in IAFIS III/FBI subject searches

    NASA Astrophysics Data System (ADS)

    Kutzleb, C. D.

    1997-02-01

    The high incidence of recidivism (repeat offenders) in the criminal population makes the use of the IAFIS III/FBI criminal database an important tool in law enforcement. The problems and solutions employed by IAFIS III/FBI criminal subject searches are discussed for the following topics: (1) subject search selectivity and reliability; (2) the difficulty and limitations of identifying subjects whose anonymity may be a prime objective; (3) database size, search workload, and search response time; (4) techniques and advantages of normalizing the variability in an individual's name and identifying features into identifiable and discrete categories; and (5) the use of database demographics to estimate the likelihood of a match between a search subject and database subjects.

  9. The effect of wild card designations and rare alleles in forensic DNA database searches.

    PubMed

    Tvedebrink, Torben; Bright, Jo-Anne; Buckleton, John S; Curran, James M; Morling, Niels

    2015-05-01

    Forensic DNA databases are powerful tools used for the identification of persons of interest in criminal investigations. Typically, they consist of two parts: (1) a database containing DNA profiles of known individuals and (2) a database of DNA profiles associated with crime scenes. The risk of adventitious or chance matches between crimes and innocent people increases as the number of profiles within a database grows and more data is shared between various forensic DNA databases, e.g. from different jurisdictions. The DNA profiles obtained from crime scenes are often partial because crime samples may be compromised in quantity or quality. When an individual's profile cannot be resolved from a DNA mixture, ambiguity is introduced. A wild card, F, may be used in place of an allele that has dropped out or when an ambiguous profile is resolved from a DNA mixture. Variant alleles that do not correspond to any marker in the allelic ladder or appear above or below the extent of the allelic ladder range are assigned the allele designation R for rare allele. R alleles are position specific with respect to the observed/unambiguous allele. The F and R designations are made when the exact genotype has not been determined. The F and R designation are treated as wild cards for searching, which results in increased chance of adventitious matches. We investigated the probability of adventitious matches given these two types of wild cards. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  10. CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units.

    PubMed

    Liu, Yongchao; Maskell, Douglas L; Schmidt, Bertil

    2009-05-06

    The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate sequence database searches using commonly available and inexpensive hardware. Our CUDASW++ implementation (benchmarked on a single-GPU NVIDIA GeForce GTX 280 graphics card and a dual-GPU GeForce GTX 295 graphics card) provides a significant performance improvement compared to other publicly available implementations, such as SWPS3, CBESW, SW-CUDA, and NCBI-BLAST. CUDASW++ supports query sequences of length up to 59K and for query sequences ranging in length from 144 to 5,478 in Swiss-Prot release 56.6, the single-GPU version achieves an average performance of 9.509 GCUPS with a lowest performance of 9.039 GCUPS and a highest performance of 9.660 GCUPS, and the dual-GPU version achieves an average performance of 14.484 GCUPS with a lowest performance of 10.660 GCUPS and a highest performance of 16.087 GCUPS. CUDASW++ is publicly available open-source software. It provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  11. Colil: a database and search service for citation contexts in the life sciences domain.

    PubMed

    Fujiwara, Toyofumi; Yamamoto, Yasunori

    2015-01-01

    To promote research activities in a particular research area, it is important to efficiently identify current research trends, advances, and issues in that area. Although review papers in the research area can suffice for this purpose in general, researchers are not necessarily able to obtain these papers from research aspects of their interests at the time they are required. Therefore, the utilization of the citation contexts of papers in a research area has been considered as another approach. However, there are few search services to retrieve citation contexts in the life sciences domain; furthermore, efficiently obtaining citation contexts is becoming difficult due to the large volume and rapid growth of life sciences papers. Here, we introduce the Colil (Comments on Literature in Literature) database to store citation contexts in the life sciences domain. By using the Resource Description Framework (RDF) and a newly compiled vocabulary, we built the Colil database and made it available through the SPARQL endpoint. In addition, we developed a web-based search service called Colil that searches for a cited paper in the Colil database and then returns a list of citation contexts for it along with papers relevant to it based on co-citations. The citation contexts in the Colil database were extracted from full-text papers of the PubMed Central Open Access Subset (PMC-OAS), which includes 545,147 papers indexed in PubMed. These papers are distributed across 3,171 journals and cite 5,136,741 unique papers that correspond to approximately 25 % of total PubMed entries. By utilizing Colil, researchers can easily refer to a set of citation contexts and relevant papers based on co-citations for a target paper. Colil helps researchers to comprehend life sciences papers in a research area more efficiently and makes their biological research more efficient.

  12. The Use of Research Electronic Data Capture (REDCap) Software to Create a Database of Librarian-Mediated Literature Searches

    PubMed Central

    LYON, JENNIFER A.; GARCIA-MILIAN, ROLANDO; NORTON, HANNAH F.; TENNANT, MICHELE R.

    2015-01-01

    Expert-mediated literature searching, a keystone service in biomedical librarianship, would benefit significantly from regular methodical review. This paper describes the novel use of Research Electronic Data Capture (REDCap) software to create a database of literature searches conducted at a large academic health sciences library. An archive of paper search requests was entered into REDCap, and librarians now prospectively enter records for current searches. Having search data readily available allows librarians to reuse search strategies and track their workload. In aggregate, this data can help guide practice and determine priorities by identifying users’ needs, tracking librarian effort, and focusing librarians’ continuing education. PMID:25023012

  13. Dialysis Search Filters for PubMed, Ovid MEDLINE, and Embase Databases

    PubMed Central

    Iansavichus, Arthur V.; Haynes, R. Brian; Lee, Christopher W.C.; Wilczynski, Nancy L.; McKibbon, Ann; Shariff, Salimah Z.; Blake, Peter G.; Lindsay, Robert M.

    2012-01-01

    Summary Background and objectives Physicians frequently search bibliographic databases, such as MEDLINE via PubMed, for best evidence for patient care. The objective of this study was to develop and test search filters to help physicians efficiently retrieve literature related to dialysis (hemodialysis or peritoneal dialysis) from all other articles indexed in PubMed, Ovid MEDLINE, and Embase. Design, setting, participants, & measurements A diagnostic test assessment framework was used to develop and test robust dialysis filters. The reference standard was a manual review of the full texts of 22,992 articles from 39 journals to determine whether each article contained dialysis information. Next, 1,623,728 unique search filters were developed, and their ability to retrieve relevant articles was evaluated. Results The high-performance dialysis filters consisted of up to 65 search terms in combination. These terms included the words “dialy” (truncated), “uremic,” “catheters,” and “renal transplant wait list.” These filters reached peak sensitivities of 98.6% and specificities of 98.5%. The filters’ performance remained robust in an independent validation subset of articles. Conclusions These empirically derived and validated high-performance search filters should enable physicians to effectively retrieve dialysis information from PubMed, Ovid MEDLINE, and Embase. PMID:22917701

  14. Dialysis search filters for PubMed, Ovid MEDLINE, and Embase databases.

    PubMed

    Iansavichus, Arthur V; Haynes, R Brian; Lee, Christopher W C; Wilczynski, Nancy L; McKibbon, Ann; Shariff, Salimah Z; Blake, Peter G; Lindsay, Robert M; Garg, Amit X

    2012-10-01

    Physicians frequently search bibliographic databases, such as MEDLINE via PubMed, for best evidence for patient care. The objective of this study was to develop and test search filters to help physicians efficiently retrieve literature related to dialysis (hemodialysis or peritoneal dialysis) from all other articles indexed in PubMed, Ovid MEDLINE, and Embase. A diagnostic test assessment framework was used to develop and test robust dialysis filters. The reference standard was a manual review of the full texts of 22,992 articles from 39 journals to determine whether each article contained dialysis information. Next, 1,623,728 unique search filters were developed, and their ability to retrieve relevant articles was evaluated. The high-performance dialysis filters consisted of up to 65 search terms in combination. These terms included the words "dialy" (truncated), "uremic," "catheters," and "renal transplant wait list." These filters reached peak sensitivities of 98.6% and specificities of 98.5%. The filters' performance remained robust in an independent validation subset of articles. These empirically derived and validated high-performance search filters should enable physicians to effectively retrieve dialysis information from PubMed, Ovid MEDLINE, and Embase.

  15. Searching for patterns in remote sensing image databases using neural networks

    NASA Technical Reports Server (NTRS)

    Paola, Justin D.; Schowengerdt, Robert A.

    1995-01-01

    We have investigated a method, based on a successful neural network multispectral image classification system, of searching for single patterns in remote sensing databases. While defining the pattern to search for and the feature to be used for that search (spectral, spatial, temporal, etc.) is challenging, a more difficult task is selecting competing patterns to train against the desired pattern. Schemes for competing pattern selection, including random selection and human interpreted selection, are discussed in the context of an example detection of dense urban areas in Landsat Thematic Mapper imagery. When applying the search to multiple images, a simple normalization method can alleviate the problem of inconsistent image calibration. Another potential problem, that of highly compressed data, was found to have a minimal effect on the ability to detect the desired pattern. The neural network algorithm has been implemented using the PVM (Parallel Virtual Machine) library and nearly-optimal speedups have been obtained that help alleviate the long process of searching through imagery.

  16. Exploring Multidisciplinary Data Sets through Database Driven Search Capabilities and Map-Based Web Services

    NASA Astrophysics Data System (ADS)

    O'Hara, S.; Ferrini, V.; Arko, R.; Carbotte, S. M.; Leung, A.; Bonczkowski, J.; Goodwillie, A.; Ryan, W. B.; Melkonian, A. K.

    2008-12-01

    Relational databases containing geospatially referenced data enable the construction of robust data access pathways that can be customized to suit the needs of a diverse user community. Web-based search capabilities driven by radio buttons and pull-down menus can be generated on-the-fly leveraging the power of the relational database and providing specialists a means of discovering specific data and data sets. While these data access pathways are sufficient for many scientists, map-based data exploration can also be an effective means of data discovery and integration by allowing users to rapidly assess the spatial co- registration of several data types. We present a summary of data access tools currently provided by the Marine Geoscience Data System (www.marine-geo.org) that are intended to serve a diverse community of users and promote data integration. Basic search capabilities allow users to discover data based on data type, device type, geographic region, research program, expedition parameters, personnel and references. In addition, web services are used to create database driven map interfaces that provide live access to metadata and data files.

  17. EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities.

    PubMed

    Hsin, Kun-Yi; Morgan, Hugh P; Shave, Steven R; Hinton, Andrew C; Taylor, Paul; Walkinshaw, Malcolm D

    2011-01-01

    We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has been established providing a number of data-mining possibilities. For each compound a single 3D conformer is stored along with over 1600 calculated descriptor values (molecular properties). A very efficient method for unique compound recognition, especially for a large scale database, is demonstrated by making use of small subgroups of the descriptors. Many of the shape and distance descriptors are held as pre-calculated bit strings permitting fast and efficient similarity and pharmacophore searches which can be used to identify families of related compounds for biological testing. Two ligand searching applications are given to demonstrate how EDULISS can be used to extract families of molecules with selected structural and biophysical features.

  18. Data Analysis Provenance: Use Case for Exoplanet Search in CoRoT Database

    NASA Astrophysics Data System (ADS)

    de Souza, L.; Salete Marcon Gomes Vaz, M.; Emílio, M.; Ferreira da Rocha, J. C.; Janot Pacheco, E.; Carlos Boufleur, R.

    2012-09-01

    CoRoT (COnvection Rotation and Planetary Transits) is a mission led by the French national space agency CNES, in collaboration with Austria, Spain, Germany, Belgium and Brazil. The mission priority is dedicated to exoplanet search and stellar seismology. CoRoT light curves database became public after one year of their delivery to the CoRoT Co-Is, following the CoRoT data policy. The CoRoT archive contains thousands of light curves in FITS format. Several exoplanet search algorithms require detrend algorithms to remove both stellar and instrumental signal, improving the chance to detect a transit. Different detrend and transit detection algorithms can be applied to the same database. Tracking the origin of the information and how the data was derived in each level in the data analysis process is essential to allow sharing, reuse, reprocessing and further analysis. This work aims at applying a formalized and codified knowledge model by means of domain ontology. It allows to enrich the data analysis with semantic and standardization. It holds the provenance information in the database for a posteriori recovers by humans or software agents.

  19. EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities

    PubMed Central

    Hsin, Kun-Yi; Morgan, Hugh P.; Shave, Steven R.; Hinton, Andrew C.; Taylor, Paul; Walkinshaw, Malcolm D.

    2011-01-01

    We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has been established providing a number of data-mining possibilities. For each compound a single 3D conformer is stored along with over 1600 calculated descriptor values (molecular properties). A very efficient method for unique compound recognition, especially for a large scale database, is demonstrated by making use of small subgroups of the descriptors. Many of the shape and distance descriptors are held as pre-calculated bit strings permitting fast and efficient similarity and pharmacophore searches which can be used to identify families of related compounds for biological testing. Two ligand searching applications are given to demonstrate how EDULISS can be used to extract families of molecules with selected structural and biophysical features. PMID:21051336

  20. mirPub: a database for searching microRNA publications.

    PubMed

    Vergoulis, Thanasis; Kanellos, Ilias; Kostoulas, Nikos; Georgakilas, Georgios; Sellis, Timos; Hatzigeorgiou, Artemis; Dalamagas, Theodore

    2015-05-01

    Identifying, amongst millions of publications available in MEDLINE, those that are relevant to specific microRNAs (miRNAs) of interest based on keyword search faces major obstacles. References to miRNA names in the literature often deviate from standard nomenclature for various reasons, since even the official nomenclature evolves. For instance, a single miRNA name may identify two completely different molecules or two different names may refer to the same molecule. mirPub is a database with a powerful and intuitive interface, which facilitates searching for miRNA literature, addressing the aforementioned issues. To provide effective search services, mirPub applies text mining techniques on MEDLINE, integrates data from several curated databases and exploits data from its user community following a crowdsourcing approach. Other key features include an interactive visualization service that illustrates intuitively the evolution of miRNA data, tag clouds summarizing the relevance of publications to particular diseases, cell types or tissues and access to TarBase 6.0 data to oversee genes related to miRNA publications. mirPub is freely available at http://www.microrna.gr/mirpub/. vergoulis@imis.athena-innovation.gr or dalamag@imis.athena-innovation.gr Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  1. A neotropical Miocene pollen database employing image-based search and semantic modeling1

    PubMed Central

    Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W.; Jaramillo, Carlos; Shyu, Chi-Ren

    2014-01-01

    • Premise of the study: Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Methods: Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Results: Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Discussion: Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery. PMID:25202648

  2. Gapped Spectral Dictionaries and Their Applications for Database Searches of Tandem Mass Spectra*

    PubMed Central

    Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno; Pevzner, Pavel A.

    2011-01-01

    Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-GappedDictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches. PMID:21444829

  3. Gapped spectral dictionaries and their applications for database searches of tandem mass spectra.

    PubMed

    Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno; Pevzner, Pavel A

    2011-06-01

    Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-Gapped-Dictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-Gapped-Dictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches.

  4. Associative memory model for searching an image database by image snippet

    NASA Astrophysics Data System (ADS)

    Khan, Javed I.; Yun, David Y.

    1994-09-01

    This paper presents an associative memory called an multidimensional holographic associative computing (MHAC), which can be potentially used to perform feature based image database query using image snippet. MHAC has the unique capability to selectively focus on specific segments of a query frame during associative retrieval. As a result, this model can perform search on the basis of featural significance described by a subset of the snippet pixels. This capability is critical for visual query in image database because quite often the cognitive index features in the snippet are statistically weak. Unlike, the conventional artificial associative memories, MHAC uses a two level representation and incorporates additional meta-knowledge about the reliability status of segments of information it receives and forwards. In this paper we present the analysis of focus characteristics of MHAC.

  5. Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation.

    PubMed

    Rognes, Torbjørn

    2011-06-01

    The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.

  6. Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

    PubMed Central

    2011-01-01

    Background The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. Results A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Conclusions Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance. PMID:21631914

  7. rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

    PubMed Central

    Hahn, Lars; Leimeister, Chris-André; Morgenstern, Burkhard

    2016-01-01

    Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don’t-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/ PMID:27760124

  8. Average probability that a "cold hit" in a DNA database search results in an erroneous attribution.

    PubMed

    Song, Yun S; Patil, Anand; Murphy, Erin E; Slatkin, Montgomery

    2009-01-01

    We consider a hypothetical series of cases in which the DNA profile of a crime-scene sample is found to match a known profile in a DNA database (i.e., a "cold hit"), resulting in the identification of a suspect based only on genetic evidence. We show that the average probability that there is another person in the population whose profile matches the crime-scene sample but who is not in the database is approximately 2(N - d)p(A), where N is the number of individuals in the population, d is the number of profiles in the database, and p(A) is the average match probability (AMP) for the population. The AMP is estimated by computing the average of the probabilities that two individuals in the population have the same profile. We show further that if a priori each individual in the population is equally likely to have left the crime-scene sample, then the average probability that the database search attributes the crime-scene sample to a wrong person is (N - d)p(A).

  9. Allie: a database and a search service of abbreviations and long forms

    PubMed Central

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader’s expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/. PMID:21498548

  10. Allie: a database and a search service of abbreviations and long forms.

    PubMed

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader's expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/.

  11. SAM: String-based sequence search algorithm for mitochondrial DNA database queries

    PubMed Central

    Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther

    2011-01-01

    The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). PMID:21056022

  12. Computational methodologies for compound database searching that utilize experimental protein-ligand interaction information.

    PubMed

    Tan, Lu; Batista, Jose; Bajorath, Jürgen

    2010-09-01

    Ligand- and target structure-based methods are widely used in virtual screening, but there is currently no methodology available that fully integrates these different approaches. Herein, we provide an overview of various attempts that have been made to combine ligand- and structure-based computational screening methods. We then review different types of approaches that utilize protein-ligand interaction information for database screening and filtering. Interaction-based approaches make use of a variety of methodological concepts including pharmacophore modeling and direct or indirect encoding of protein-ligand interactions in fingerprint formats. These interaction-based methods have been successfully applied to tackle different tasks related to virtual screening including postprocessing of docking poses, prioritization of binding modes, selectivity analysis, or similarity searching. Furthermore, we discuss the recently developed interacting fragment approach that indirectly incorporates 3D interaction information into 2D similarity searching and bridges between ligand- and structure-based methods.

  13. What value is the CINAHL database when searching for systematic reviews of qualitative studies?

    PubMed

    Wright, Kath; Golder, Su; Lewis-Light, Kate

    2015-06-26

    The Cumulative Index to Nursing and Allied Health Literature (CINAHL) is generally thought to be a good source to search when conducting a review of qualitative evidence. Case studies have suggested that using CINAHL could be essential for reviews of qualitative studies covering topics in the nursing field, but it is unclear whether this can be extended more generally to reviews of qualitative studies in other topic areas. We carried out a retrospective analysis of a sample of systematic reviews of qualitative studies to investigate CINAHL's potential contribution to identifying the evidence. In particular, we planned to identify the percentage of included studies available in CINAHL and the percentage of the included studies unique to the CINAHL database. After screening 58 qualitative systematic reviews identified from the Database of Abstracts of Reviews of Effects (DARE), we created a sample set of 43 reviews covering a range of topics including patient experience of both illnesses and interventions. For all 43 reviews (21 %) in our sample, we found that some of the included studies were available in CINAHL. For nine of these reviews, all the studies that had been included in the final synthesis were available in the CINAHL database, so it could have been possible to identify all the included studies using just this one database, while for an additional 21 reviews (49 %), 80 % or more of the included studies were available in CINAHL. Consequently, for a total of 30 reviews, or 70 % of our sample, 80 % or more of the studies could be identified using CINAHL alone. 11 reviews, where we were able to recheck all the databases used by the original review authors, had included a study that was uniquely identified from the CINAHL database. The median % of unique studies was 9.09%; while the range had a lowest value of 5.0% to the highest value of 33.0%. [corrected]. Assuming a rigorous search strategy was used and the records sought were accurately indexed, we could

  14. Addressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies

    PubMed Central

    2012-01-01

    Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five “incorrect” targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes

  15. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies.

    PubMed

    Blakeley, Paul; Overton, Ian M; Hubbard, Simon J

    2012-11-02

    Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes

  16. MS-GF+ makes progress towards a universal database search tool for proteomics.

    PubMed

    Kim, Sangtae; Pevzner, Pavel A

    2014-10-31

    Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but the software tools to analyse tandem mass spectra are lagging behind. We present a database search tool MS-GF+ that is sensitive (it identifies more peptides than most other database search tools) and universal (it works well for diverse types of spectra, different configurations of MS instruments and different experimental protocols). We benchmark MS-GF+ using diverse spectral data sets: (i) spectra of varying fragmentation methods; (ii) spectra of multiple enzyme digests; (iii) spectra of phosphorylated peptides; and (iv) spectra of peptides with unusual fragmentation propensities produced by a novel alpha-lytic protease. For all these data sets, MS-GF+ significantly increases the number of identified peptides compared with commonly used methods for peptide identifications. We emphasize that although MS-GF+ is not specifically designed for any particular experimental set-up, it improves on the performance of tools specifically designed for these applications (for example, specialized tools for phosphoproteomics).

  17. Fine-grained Database Field Search Using Attribute-Based Encryption for E-Healthcare Clouds.

    PubMed

    Guo, Cheng; Zhuang, Ruhan; Jie, Yingmo; Ren, Yizhi; Wu, Ting; Choo, Kim-Kwang Raymond

    2016-11-01

    An effectively designed e-healthcare system can significantly enhance the quality of access and experience of healthcare users, including facilitating medical and healthcare providers in ensuring a smooth delivery of services. Ensuring the security of patients' electronic health records (EHRs) in the e-healthcare system is an active research area. EHRs may be outsourced to a third-party, such as a community healthcare cloud service provider for storage due to cost-saving measures. Generally, encrypting the EHRs when they are stored in the system (i.e. data-at-rest) or prior to outsourcing the data is used to ensure data confidentiality. Searchable encryption (SE) scheme is a promising technique that can ensure the protection of private information without compromising on performance. In this paper, we propose a novel framework for controlling access to EHRs stored in semi-trusted cloud servers (e.g. a private cloud or a community cloud). To achieve fine-grained access control for EHRs, we leverage the ciphertext-policy attribute-based encryption (CP-ABE) technique to encrypt tables published by hospitals, including patients' EHRs, and the table is stored in the database with the primary key being the patient's unique identity. Our framework can enable different users with different privileges to search on different database fields. Differ from previous attempts to secure outsourcing of data, we emphasize the control of the searches of the fields within the database. We demonstrate the utility of the scheme by evaluating the scheme using datasets from the University of California, Irvine.

  18. Real-Time Ligand Binding Pocket Database Search Using Local Surface Descriptors

    PubMed Central

    Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

    2010-01-01

    Due to the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of a particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two dimensional pseudo-Zernike moments or the 3D Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark study employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed. PMID:20455259

  19. Designing novel nicotinic agonists by searching a database of molecular shapes

    NASA Astrophysics Data System (ADS)

    Sheridan, Robert P.; Venkataraghavan, R.

    1987-10-01

    We introduce an approach by which novel ligands can be designed for a receptor if a pharmacophore geometry has been established and the receptor-bound conformations of other ligands are known. We use the shape-matching method of Kuntz et al. [J. Mol. Biol., 161 (1982) 269-288] to search a database of molecular shapes for those molecules which can fit inside the combined volume of the known ligands and which have interatomic distances compatible with the pharmacophore geometry. Some of these molecules are then modified by interactive modeling techniques to better match the chemical properties of the known ligands. Our shape database (about 5000 candidate molecules) is derived from a subset of the Cambridge Crystallographic Database [Allen et al., Acta Crystallogr., Sect. B,35 (1979) 2331-2339]. We show, as an example, how several novel designs for nicotinic agonists can be derived by this approach, given a pharmacophore model derived from known agonists [Sheridan et al., J. Med. Chem., 29 (1986) 889-906]. This report complements our previous report [DesJarlais et al., J. Med. Chem., in press], which introduced a similar method for designing ligands when the structure of the receptor is known.

  20. Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows.

    PubMed

    Verheggen, Kenneth; Raeder, Helge; Berven, Frode S; Martens, Lennart; Barsnes, Harald; Vaudel, Marc

    2017-09-13

    Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines. © 2017 Wiley Periodicals, Inc.

  1. The Relationship between Searches Performed in Online Databases and the Number of Full-Text Articles Accessed: Measuring the Interaction between Database and E-Journal Collections

    ERIC Educational Resources Information Center

    Lamothe, Alain R.

    2011-01-01

    The purpose of this paper is to report the results of a quantitative analysis exploring the interaction and relationship between the online database and electronic journal collections at the J. N. Desmarais Library of Laurentian University. A very strong relationship exists between the number of searches and the size of the online database…

  2. The Relationship between Searches Performed in Online Databases and the Number of Full-Text Articles Accessed: Measuring the Interaction between Database and E-Journal Collections

    ERIC Educational Resources Information Center

    Lamothe, Alain R.

    2011-01-01

    The purpose of this paper is to report the results of a quantitative analysis exploring the interaction and relationship between the online database and electronic journal collections at the J. N. Desmarais Library of Laurentian University. A very strong relationship exists between the number of searches and the size of the online database…

  3. MHC-I ligand discovery using targeted database searches of mass spectrometry data: Implications for T cell immunotherapies.

    PubMed

    Murphy, John Patrick; Konda, Prathyusha; Kowalewski, Daniel J; Schuster, Heiko; Clements, Derek; Kim, Youra; Cohen, Alejandro Martin; Sharif, Tanveer; Nielsen, Morten; Stevanović, Stefan; Lee, Patrick W; Gujar, Shashi

    2017-02-28

    Class I major histocompatibility complex I (MHC-I)-bound peptide ligands dictate the activation and specificity of CD8+ T-cells, and thus are important for devising T cell immunotherapies. In recent times, advances in mass spectrometry (MS) have enabled the precise identification of these MHC-I peptides wherein MS spectra are compared against a reference proteome. Unfortunately, matching these spectra to reference proteome databases is hindered by inflated search spaces attributed to a lack of enzyme restriction in the searches, limiting the efficiency with which MHC ligands are discovered. Here, we offer a solution to this problem whereby we developed a targeted database search approach, and accompanying tool SpectMHC, that is based on a priori-predicted MHC-I peptides. We first validated the approach using mass spectrometry data from 2 different allotype-specific mouse antibodies for the C57BL/6 mouse background. We then developed allotype-specific HLA databases to search previously published MS datasets of human peripheral blood mononuclear cells (PBMCs). Using this targeted search strategy improved peptide identifications for both mouse and human ligandomes by greater than two-fold and is superior to traditional "no enzyme" searches of reference proteomes. Our novel targeted database search promises to uncover otherwise missed novel T cell epitopes of therapeutic potential.

  4. ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches.

    PubMed

    Rognes, T

    2001-04-01

    There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

  5. Internet Databases of the Properties, Enzymatic Reactions, and Metabolism of Small Molecules—Search Options and Applications in Food Science

    PubMed Central

    Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Bucholska, Justyna; Starowicz, Piotr; Czyrko, Emilia

    2016-01-01

    Internet databases of small molecules, their enzymatic reactions, and metabolism have emerged as useful tools in food science. Database searching is also introduced as part of chemistry or enzymology courses for food technology students. Such resources support the search for information about single compounds and facilitate the introduction of secondary analyses of large datasets. Information can be retrieved from databases by searching for the compound name or structure, annotating with the help of chemical codes or drawn using molecule editing software. Data mining options may be enhanced by navigating through a network of links and cross-links between databases. Exemplary databases reviewed in this article belong to two classes: tools concerning small molecules (including general and specialized databases annotating food components) and tools annotating enzymes and metabolism. Some problems associated with database application are also discussed. Data summarized in computer databases may be used for calculation of daily intake of bioactive compounds, prediction of metabolism of food components, and their biological activity as well as for prediction of interactions between food component and drugs. PMID:27929431

  6. Internet Databases of the Properties, Enzymatic Reactions, and Metabolism of Small Molecules-Search Options and Applications in Food Science.

    PubMed

    Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Bucholska, Justyna; Starowicz, Piotr; Czyrko, Emilia

    2016-12-06

    Internet databases of small molecules, their enzymatic reactions, and metabolism have emerged as useful tools in food science. Database searching is also introduced as part of chemistry or enzymology courses for food technology students. Such resources support the search for information about single compounds and facilitate the introduction of secondary analyses of large datasets. Information can be retrieved from databases by searching for the compound name or structure, annotating with the help of chemical codes or drawn using molecule editing software. Data mining options may be enhanced by navigating through a network of links and cross-links between databases. Exemplary databases reviewed in this article belong to two classes: tools concerning small molecules (including general and specialized databases annotating food components) and tools annotating enzymes and metabolism. Some problems associated with database application are also discussed. Data summarized in computer databases may be used for calculation of daily intake of bioactive compounds, prediction of metabolism of food components, and their biological activity as well as for prediction of interactions between food component and drugs.

  7. Seismic Search Engine: A distributed database for mining large scale seismic data

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Vaidya, S.; Kuzma, H. A.

    2009-12-01

    The International Monitoring System (IMS) of the CTBTO collects terabytes worth of seismic measurements from many receiver stations situated around the earth with the goal of detecting underground nuclear testing events and distinguishing them from other benign, but more common events such as earthquakes and mine blasts. The International Data Center (IDC) processes and analyzes these measurements, as they are collected by the IMS, to summarize event detections in daily bulletins. Thereafter, the data measurements are archived into a large format database. Our proposed Seismic Search Engine (SSE) will facilitate a framework for data exploration of the seismic database as well as the development of seismic data mining algorithms. Analogous to GenBank, the annotated genetic sequence database maintained by NIH, through SSE, we intend to provide public access to seismic data and a set of processing and analysis tools, along with community-generated annotations and statistical models to help interpret the data. SSE will implement queries as user-defined functions composed from standard tools and models. Each query is compiled and executed over the database internally before reporting results back to the user. Since queries are expressed with standard tools and models, users can easily reproduce published results within this framework for peer-review and making metric comparisons. As an illustration, an example query is “what are the best receiver stations in East Asia for detecting events in the Middle East?” Evaluating this query involves listing all receiver stations in East Asia, characterizing known seismic events in that region, and constructing a profile for each receiver station to determine how effective its measurements are at predicting each event. The results of this query can be used to help prioritize how data is collected, identify defective instruments, and guide future sensor placements.

  8. The Interactive Online SKY/M-FISH & CGH Database and the Entrez Cancer Chromosomes Search Database: Linkage of Chromosomal Aberrations with the Genome Sequence

    PubMed Central

    Knutsen, Turid; Gobu, Vasuki; Knaus, Rodger; Padilla-Nash, Hesed; Augustus, Meena; Strausberg, Robert L.; Kirsch, Ilan R.; Sirotkin, Karl; Ried, Thomas

    2005-01-01

    To catalogue data on chromosomal aberrations in cancer derived from emerging molecular cytogenetic techniques and to integrate these data with genome maps, we have established two resources, the NCI and NCBI SKY/M-FISH & CGH Database, and the Cancer Chromosomes database. The goal of the former is to allow investigators to submit and analyze clinical and research cytogenetic data. It contains a karyotype parser tool, which automatically converts the ISCN short-form karyotype into an internal representation displayed in detailed form and as a colored ideogram with band overlay, and also contains a tool to compare CGH profiles from multiple cases. The Cancer Chromosomes database integrates the SKY/M-FISH & CGH Database with the Mitelman Database of Chromosome Aberrations in Cancer, and the Recurrent Chromosome Aberrations in Cancer database. These three datasets can now be searched seamlessly by use of the Entrez search and retrieval system for chromosome aberrations, clinical data, and reference citations. Common diagnoses, anatomic sites, chromosome breakpoints, junctions, numerical and structural abnormalities, and bands gained and lost among selected cases can be compared by use of the “similarity” report. Because the model used for CGH data is a subset of the karyotype data, it is now possible to examine the similarities between CGH results and karyotypes directly. All chromosomal bands are directly linked to the Entrez Map Viewer database, providing integration of cytogenetic data with the sequence assembly. These resources, developed as a part of the Cancer Chromosome Aberration Project (CCAP) initiative, aid the search for new cancer-associated genes and foster insights into the causes and consequences of genetic alterations in cancer. PMID:15934046

  9. SCANPS: a web server for iterative protein sequence database searching by dynamic programing, with display in a hierarchical SCOP browser.

    PubMed

    Walsh, Thomas P; Webber, Caleb; Searle, Stephen; Sturrock, Shane S; Barton, Geoffrey J

    2008-07-01

    SCANPS performs iterative profile searching similar to PSI-BLAST but with full dynamic programing on each cycle and on-the-fly estimation of significance. This combination gives good sensitivity and selectivity that outperforms PSI-BLAST in domain-searching benchmarks. Although computationally expensive, SCANPS exploits onchip parallelism (MMX and SSE2 instructions on Intel chips) as well as MPI parallelism to give acceptable turnround times even for large databases. A web server developed to run SCANPS searches is now available at http://www.compbio.dundee.ac.uk/www-scanps. The server interface allows a range of different protein sequence databases to be searched including the SCOP database of protein domains. The server provides the user with regularly updated versions of the main protein sequence databases and is backed up by significant computing resources which ensure that searches are performed rapidly. For SCOP searches, the results may be viewed in a new tree-based representation that reflects the structure of the SCOP hierarchy; this aids the user in placing each hit in the context of its SCOP classification and understanding its relationship to other domains in SCOP.

  10. Decision making in family medicine: randomized trial of the effects of the InfoClinique and Trip database search engines.

    PubMed

    Labrecque, Michel; Ratté, Stéphane; Frémont, Pierre; Cauchon, Michel; Ouellet, Jérôme; Hogg, William; McGowan, Jessie; Gagnon, Marie-Pierre; Njoya, Merlin; Légaré, France

    2013-10-01

    To compare the ability of users of 2 medical search engines, InfoClinique and the Trip database, to provide correct answers to clinical questions and to explore the perceived effects of the tools on the clinical decision-making process. Randomized trial. Three family medicine units of the family medicine program of the Faculty of Medicine at Laval University in Quebec city, Que. Fifteen second-year family medicine residents. Residents generated 30 structured questions about therapy or preventive treatment (2 questions per resident) based on clinical encounters. Using an Internet platform designed for the trial, each resident answered 20 of these questions (their own 2, plus 18 of the questions formulated by other residents, selected randomly) before and after searching for information with 1 of the 2 search engines. For each question, 5 residents were randomly assigned to begin their search with InfoClinique and 5 with the Trip database. The ability of residents to provide correct answers to clinical questions using the search engines, as determined by third-party evaluation. After answering each question, participants completed a questionnaire to assess their perception of the engine's effect on the decision-making process in clinical practice. Of 300 possible pairs of answers (1 answer before and 1 after the initial search), 254 (85%) were produced by 14 residents. Of these, 132 (52%) and 122 (48%) pairs of answers concerned questions that had been assigned an initial search with InfoClinique and the Trip database, respectively. Both engines produced an important and similar absolute increase in the proportion of correct answers after searching (26% to 62% for InfoClinique, for an increase of 36%; 24% to 63% for the Trip database, for an increase of 39%; P = .68). For all 30 clinical questions, at least 1 resident produced the correct answer after searching with either search engine. The mean (SD) time of the initial search for each question was 23.5 (7

  11. Integration of first-principles methods and crystallographic database searches for new ferroelectrics: Strategies and explorations

    NASA Astrophysics Data System (ADS)

    Bennett, Joseph W.; Rabe, Karin M.

    2012-11-01

    In this concept paper, the development of strategies for the integration of first-principles methods with crystallographic database mining for the discovery and design of novel ferroelectric materials is discussed, drawing on the results and experience derived from exploratory investigations on three different systems: (1) the double perovskite Sr(Sb1/2Mn1/2)O3 as a candidate semiconducting ferroelectric; (2) polar derivatives of schafarzikite MSb2O4; and (3) ferroelectric semiconductors with formula M2P2(S,Se)6. A variety of avenues for further research and investigation are suggested, including automated structure type classification, low-symmetry improper ferroelectrics, and high-throughput first-principles searches for additional representatives of structural families with desirable functional properties.

  12. Preoperative predictors of blood loss at the time of radical prostatectomy: results from the SEARCH database.

    PubMed

    Lloyd, J C; Bañez, L L; Aronson, W J; Terris, M K; Presti, J C; Amling, C L; Kane, C J; Freedland, S J

    2009-01-01

    The literature contains conflicting data on preoperative predictors of estimated blood loss (EBL) at radical retropubic prostatectomy (RRP). We sought to examine preoperative predictors of EBL at the time of RRP among patients from the SEARCH database to lend clarity to this issue. A total of 1154 patients were identified in the SEARCH database who underwent RRP between 1988 and 2008 and had EBL data available. We examined multiple preoperative factors for their ability to predict EBL using multivariate linear regression analysis. Median EBL was 900 ml (s.d. 1032). The 25th and 75th percentile for EBL were 600 and 1500 ml, respectively. EBL increased significantly with increasing body mass index (BMI) and increasing prostate size and decreased with more recent year of RRP (all P<0.001). The mean-adjusted EBL in normal-weight men (BMI<25 kg/m(2)) was 807 ml compared to 1067 ml among severely obese men (BM I>or=35 kg/m(2)). Predicted EBL for men with the smallest prostates (<20 g) was 721 ml, compared to 1326 ml for men with prostates >or=100 g. Finally, statistically significant differences between centers were observed, with mean-adjusted EBL ranging from 844 to 1094 ml. Both BMI and prostate size are predictors of increased EBL. Prostate size is of particular note, as a nearly twofold increased EBL was seen from the smallest (<20 g) to the largest prostates (>or=100 g). Over time, average EBL significantly decreased. Finally, significant differences in EBL were observed between centers. Patients with multiple risk factors should be forewarned they are at increased risk for higher EBL, which may translate into a greater need for blood transfusion.

  13. The Open Spectral Database: an open platform for sharing and searching spectral data.

    PubMed

    Chalk, Stuart J

    2016-01-01

    A number of websites make available spectral data for download (typically as JCAMP-DX text files) and one (ChemSpider) that also allows users to contribute spectral files. As a result, searching and retrieving such spectral data can be time consuming, and difficult to reuse if the data is compressed in the JCAMP-DX file. What is needed is a single resource that allows submission of JCAMP-DX files, export of the raw data in multiple formats, searching based on multiple chemical identifiers, and is open in terms of license and access. To address these issues a new online resource called the Open Spectral Database (OSDB) http://osdb.info/ has been developed and is now available. Built using open source tools, using open code (hosted on GitHub), providing open data, and open to community input about design and functionality, the OSDB is available for anyone to submit spectral data, making it searchable and available to the scientific community. This paper details the concept and coding, internal architecture, export formats, Representational State Transfer (REST) Application Programming Interface and options for submission of data. The OSDB website went live in November 2015. Concurrently, the GitHub repository was made available at https://github.com/stuchalk/OSDB/, and is open for collaborators to join the project, submit issues, and contribute code. The combination of a scripting environment (PHPStorm), a PHP Framework (CakePHP), a relational database (MySQL) and a code repository (GitHub) provides all the capabilities to easily develop REST based websites for ingestion, curation and exposure of open chemical data to the community at all levels. It is hoped this software stack (or equivalent ones in other scripting languages) will be leveraged to make more chemical data available for both humans and computers.

  14. Internet programs for drawing moth pheromone analogs and searching literature database.

    PubMed

    Byers, John A

    2002-04-01

    An Internet web page is described for organizing and analyzing information about lepidopteran sex pheromone components. Hypertext markup language (HTML) with JavaScript program code is used to draw moth pheromone analogs by combining GIF bitmap images for viewing by web browsers such as Netscape or Microsoft Intemet Explorer. Straight-chain hydrocarbons of 5-22 carbons with epoxides or unsaturated positions of E or Z geometrical configuration with several altemative functional groups can be drawn by simply checking menu bars or checkboxes representing chain length, E/Z unsaturation points, epoxide position and chirality, and optional functional groups. The functional group can be an aldehyde, alcohol, or ester of formate, acetate, propionate, or butyrate. The program is capable of drawing several million structures and naming them [e.g., (E,E)-8,10-dodecadien-1-ol and abbreviated as E8E10-12:OH]. A Java applet program run from the same page searches forthe presently drawn structure in an intemal database compiled from the Pherolist, and if the component is found, provides a textarea display of the families and species using the component. Links are automatically specified for drawn components if found in the Pherolist web site (maintained by H. Am). Windowed links can also be made to two other JavaScript programs that allow searches of a web site database with over 5900 research citations on lepidopteran semiochemicals and a calculator of vapor pressures of some moth sex pheromone analogs at a specified temperature. Various evolutionary and biosynthetic aspects are discussed in regard to the diversity of moth sex pheromone components.

  15. webPRC: the Profile Comparer for alignment-based searching of public domain databases.

    PubMed

    Brandt, Bernd W; Heringa, Jaap

    2009-07-01

    Profile-profile methods are well suited to detect remote evolutionary relationships between protein families. Profile Comparer (PRC) is an existing stand-alone program for scoring and aligning hidden Markov models (HMMs), which are based on multiple sequence alignments. Since PRC compares profile HMMs instead of sequences, it can be used to find distant homologues. For this purpose, PRC is used by, for example, the CATH and Pfam-domain databases. As PRC is a profile comparer, it only reports profile HMM alignments and does not produce multiple sequence alignments. We have developed webPRC server, which makes it straightforward to search for distant homologues or similar alignments in a number of domain databases. In addition, it provides the results both as multiple sequence alignments and aligned HMMs. Furthermore, the user can view the domain annotation, evaluate the PRC hits with the Jalview multiple alignment editor and generate logos from the aligned HMMs or the aligned multiple alignments. Thus, this server assists in detecting distant homologues with PRC as well as in evaluating and using the results. The webPRC interface is available at http://www.ibi.vu.nl/programs/prcwww/.

  16. Neuron-Miner: An Advanced Tool for Morphological Search and Retrieval in Neuroscientific Image Databases.

    PubMed

    Conjeti, Sailesh; Mesbah, Sepideh; Negahdar, Mohammadreza; Rautenberg, Philipp L; Zhang, Shaoting; Navab, Nassir; Katouzian, Amin

    2016-10-01

    The steadily growing amounts of digital neuroscientific data demands for a reliable, systematic, and computationally effective retrieval algorithm. In this paper, we present Neuron-Miner, which is a tool for fast and accurate reference-based retrieval within neuron image databases. The proposed algorithm is established upon hashing (search and retrieval) technique by employing multiple unsupervised random trees, collectively called as Hashing Forests (HF). The HF are trained to parse the neuromorphological space hierarchically and preserve the inherent neuron neighborhoods while encoding with compact binary codewords. We further introduce the inverse-coding formulation within HF to effectively mitigate pairwise neuron similarity comparisons, thus allowing scalability to massive databases with little additional time overhead. The proposed hashing tool has superior approximation of the true neuromorphological neighborhood with better retrieval and ranking performance in comparison to existing generalized hashing methods. This is exhaustively validated by quantifying the results over 31266 neuron reconstructions from Neuromorpho.org dataset curated from 147 different archives. We envisage that finding and ranking similar neurons through reference-based querying via Neuron Miner would assist neuroscientists in objectively understanding the relationship between neuronal structure and function for applications in comparative anatomy or diagnosis.

  17. A method for fast database search for all k-nucleotide repeats.

    PubMed Central

    Benson, G; Waterman, M S

    1994-01-01

    A significant portion of DNA consists of repeating patterns of various sizes, from very small (one, two and three nucleotides) to very large (over 300 nucleotides). Although the functions of these repeating regions are not well understood, they appear important for understanding the expression, regulation and evolution of DNA. For example, increases in the number of trinucleotide repeats have been associated with human genetic disease, including Fragile-X mental retardation and Huntington's disease. Repeats are also useful as a tool in mapping and identifying DNA; the number of copies of a particular pattern at a site is often variable among individuals (polymorphic) and is therefore helpful in locating genes via linkage studies and also in providing DNA fingerprints of individuals. The number of repeating regions is unknown as is the distribution of pattern sizes. It would be useful to search for such regions in the DNA database in order that they may be studied more fully. The DNA database currently consists of approximately 150 million basepairs and is growing exponentially. Therefore, any program to look for repeats must be efficient and fast. In this paper, we present some new techniques that are useful in recognizing repeating patterns and describe a new program for rapidly detecting repeat regions in the DNA database where the basic unit of the repeat has size up to 32 nucleotides. It is our hope that the examples in this paper will illustrate the unrealized diversity of repeats in DNA and that the program we have developed will be a useful tool for locating new and interesting repeats. PMID:7984436

  18. Matrix-product-state simulation of an extended Brueschweiler bulk-ensemble database search

    SciTech Connect

    SaiToh, Akira; Kitagawa, Masahiro

    2006-06-15

    Brueschweiler's database search in a spin Liouville space can be efficiently simulated on a conventional computer without error as long as the simulation cost of the internal circuit of an oracle function is polynomial, unlike the fact that in true NMR experiments, it suffers from an exponential decrease in the variation of a signal intensity. With the simulation method using the matrix-product-state proposed by Vidal [G. Vidal, Phys. Rev. Lett. 91, 147902 (2003)], we perform such a simulation. We also show the extensions of the algorithm without utilizing the J-coupling or DD-coupling splitting of frequency peaks in observation: searching can be completed with a single query in polynomial postoracle circuit complexities in an extension; multiple solutions of an oracle can be found in another extension whose query complexity is linear in the key length and in the number of solutions (this extension is to find all of marked keys). These extended algorithms are also simulated with the same simulation method.

  19. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

    PubMed Central

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905

  20. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.

  1. deconSTRUCT: general purpose protein database search on the substructure level.

    PubMed

    Zhang, Zong Hong; Bharatham, Kavitha; Sherman, Westley A; Mihalek, Ivana

    2010-07-01

    deconSTRUCT webserver offers an interface to a protein database search engine, usable for a general purpose detection of similar protein (sub)structures. Initially, it deconstructs the query structure into its secondary structure elements (SSEs) and reassembles the match to the target by requiring a (tunable) degree of similarity in the direction and sequential order of SSEs. Hierarchical organization and judicious use of the information about protein structure enables deconSTRUCT to achieve the sensitivity and specificity of the established search engines at orders of magnitude increased speed, without tying up irretrievably the substructure information in the form of a hash. In a post-processing step, a match on the level of the backbone atoms is constructed. The results presented to the user consist of the list of the matched SSEs, the transformation matrix for rigid superposition of the structures and several ways of visualization, both downloadable and implemented as a web-browser plug-in. The server is available at http://epsf.bmad.bii.a-star.edu.sg/struct_server.html.

  2. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2015-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  3. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2014-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  4. MSDsite: a database search and retrieval system for the analysis and viewing of bound ligands and active sites.

    PubMed

    Golovin, Adel; Dimitropoulos, Dimitris; Oldfield, Tom; Rachedi, Abdelkrim; Henrick, Kim

    2005-01-01

    The three-dimensional environments of ligand binding sites have been derived from the parsing and loading of the PDB entries into a relational database. For each bound molecule the biological assembly of the quaternary structure has been used to determine all contact residues and a fast interactive search and retrieval system has been developed. Prosite pattern and short sequence search options are available together with a novel graphical query generator for inter-residue contacts. The database and its query interface are accessible from the Internet through a web server located at: http://www.ebi.ac.uk/msd-srv/msdsite.

  5. UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling.

    PubMed

    Bhattacharya, Debswapna; Cao, Renzhi; Cheng, Jianlin

    2016-09-15

    Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named 'foldons' through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/ CONTACT: chengji@missouri.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  6. Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database

    PubMed Central

    2007-01-01

    We present a novel protein structure database search tool, 3D-BLAST, that is useful for analyzing novel structures and can return a ranked list of alignments. This tool has the features of BLAST (for example, robust statistical basis, and effective and reliable search capabilities) and employs a kappa-alpha (κ, α) plot derived structural alphabet and a new substitution matrix. 3D-BLAST searches more than 12,000 protein structures in 1.2 s and yields good results in zones with low sequence similarity. PMID:17335583

  7. Searching for NEO precoveries in the PS1 and MPC databases

    NASA Astrophysics Data System (ADS)

    Weryk, Robert J.; Wainscoat, Richard J.

    2016-10-01

    The Pan-STARRS (PS1) survey telescope, operated by the University of Hawai`i, covers the sky north of -49 degrees declination with its seven square degree field-of-view. Described in detail by Wainscoat et al. (2015), it has become the leading telescope for new Near Earth Object (NEO) discoveries. In 2015, it found almost half of the new Near Earth Asteroids, as well as half of the new comets.Observations of potential NEOs must be followed up before they can be confirmed and announced as new discoveries, and we are dependent on the follow-up capabilities of other telescopes for this. However, not every NEO candidate is immediately followed up and linked into a well established orbit, possibly due to the fact that smaller bodies may not be visible at current instrument sensitivity limits for very long, or that their predicted orbits are too uncertain so follow-up telescopes look in the wrong location. But in certain cases, these objects may have been observed during previous lunations.We present a method to search for precovery detections in both the PS1 database, and the Isolated Tracklet File (ITF) provided by the Minor Planet Center (MPC). This file contains over 12 million detections mostly from the large surveys, which are not associated with any known objects. We demonstrate that multi-tracklet linkages for both known and unknown objects may be found in these databases, including detections for both NEOs and non-NEOs which often appear on the MPC's NEO Confirmation Page.[1] Wainscoat, R. et al., IAU Symposium 318, editors S. Chesley and R. Jedicke (2015)

  8. GLYCEMIC CONTROL AND PROSTATE CANCER PROGRESSION: RESULTS FROM THE SEARCH DATABASE

    PubMed Central

    Kim, Howard; Presti, Joseph C.; Aronson, William J.; Terris, Martha K.; Kane, Christopher J.; Amling, Christopher L.; Freedland, Stephen J.

    2010-01-01

    Purpose Several studies have examined the association between diabetes mellitus (DM) and prostate cancer (PCa) risk and progression, however nearly all of these studies have compared diabetic vs. non-diabetic men. We sought to investigate the role of glycemic control, as measured by HbA1c, on PCa aggressiveness and prognosis in men with DM and PCa from the Shared Equal Access Regional Cancer Hospital (SEARCH) database. Methods We identified 247 men in SEARCH with DM and a recorded HbA1c value twelve months prior to radical prostatectomy between 1988 and 2009. We divided men into tertiles by HbA1c level. The association between HbA1c tertiles and risk of adverse pathology and biochemical recurrence were tested using multivariate logistic regression and Cox proportional hazards models, respectively. Results Median HbA1c level was 6.9. On multivariate analysis, HbA1c tertiles were predictive of pathological Gleason score (p-trend=0.001). Relative to the first tertile, men in the second (OR 5.90, p=0.002) and third tertile (OR 7.15, p=0.001) were more likely to have Gleason score ≥ 4+3. HbA1c tertiles were not associated with margin status, node status, extracapsular extension or seminal vesicle invasion (all p-trend>0.2). In the multivariate Cox proportional hazards model, increasing HbA1c tertiles were not significantly related to risk of biochemical recurrence (p-trend=0.56). Conclusion Men with higher HbA1c levels presented with more biologically aggressive prostate tumors at radical prostatectomy. Although risk of recurrence was unrelated to HbA1c levels, further studies are needed to better explore the importance of glycemic control on long-term outcomes in diabetic men with PCa. PMID:20687228

  9. Complementary Value of Databases for Discovery of Scholarly Literature: A User Survey of Online Searching for Publications in Art History

    ERIC Educational Resources Information Center

    Nemeth, Erik

    2010-01-01

    Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…

  10. Complementary Value of Databases for Discovery of Scholarly Literature: A User Survey of Online Searching for Publications in Art History

    ERIC Educational Resources Information Center

    Nemeth, Erik

    2010-01-01

    Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…

  11. Introducing a New Interface for the Online MagIC Database by Integrating Data Uploading, Searching, and Visualization

    NASA Astrophysics Data System (ADS)

    Jarboe, N.; Minnett, R.; Constable, C.; Koppers, A. A.; Tauxe, L.

    2013-12-01

    The Magnetics Information Consortium (MagIC) is dedicated to supporting the paleomagnetic, geomagnetic, and rock magnetic communities through the development and maintenance of an online database (http://earthref.org/MAGIC/), data upload and quality control, searches, data downloads, and visualization tools. While MagIC has completed importing some of the IAGA paleomagnetic databases (TRANS, PINT, PSVRL, GPMDB) and continues to import others (ARCHEO, MAGST and SECVR), further individual data uploading from the community contributes a wealth of easily-accessible rich datasets. Previously uploading of data to the MagIC database required the use of an Excel spreadsheet using either a Mac or PC. The new method of uploading data utilizes an HTML 5 web interface where the only computer requirement is a modern browser. This web interface will highlight all errors discovered in the dataset at once instead of the iterative error checking process found in the previous Excel spreadsheet data checker. As a web service, the community will always have easy access to the most up-to-date and bug free version of the data upload software. The filtering search mechanism of the MagIC database has been changed to a more intuitive system where the data from each contribution is displayed in tables similar to how the data is uploaded (http://earthref.org/MAGIC/search/). Searches themselves can be saved as a permanent URL, if desired. The saved search URL could then be used as a citation in a publication. When appropriate, plots (equal area, Zijderveld, ARAI, demagnetization, etc.) are associated with the data to give the user a quicker understanding of the underlying dataset. The MagIC database will continue to evolve to meet the needs of the paleomagnetic, geomagnetic, and rock magnetic communities.

  12. In search of yoga: Research trends in a western medical database.

    PubMed

    McCall, Marcy C

    2014-01-01

    The promotion of yoga practice as a preventative and treatment therapy for health outcomes in the western hemisphere is increasing rapidly. As the commercial success of yoga burgeons in popular culture, it is important to investigate the trends of yoga as a therapeutic intervention in academic literature. The free-access search engine, PubMed is a preeminent resource to identify health-related research articles published for academics, health practitioners and others. To report the recent yoga-related publications in the western healthcare context with particular interest in the subject and type of yoga titles. A bibliometric analysis to describe the annual trends in publication on PubMed from January 1950 to December 2012. The number of yoga-related titles included in the PubMed database is limited until a marked increase 2000 and steady surge since 2007. Bibliometric analysis indicates that more than 200 new titles are added per annum since 2011. Systematic reviews and yoga trials are increasing exponentially, indicating a potential increase in the quality of evidence. Titles including pain management, stress or anxiety, depression and cancer conditions are highly correlated with yoga and healthcare research. The prevalence of yoga research in western healthcare is increasing. The marked increase in volume indicates the need for more systematic analysis of the literature in terms of quality and results.

  13. In search of a statistical probability model for petroleum-resource assessment : a critique of the probabilistic significance of certain concepts and methods used in petroleum-resource assessment : to that end, a probabilistic model is sketched

    USGS Publications Warehouse

    Grossling, Bernardo F.

    1975-01-01

    Exploratory drilling is still in incipient or youthful stages in those areas of the world where the bulk of the potential petroleum resources is yet to be discovered. Methods of assessing resources from projections based on historical production and reserve data are limited to mature areas. For most of the world's petroleum-prospective areas, a more speculative situation calls for a critical review of resource-assessment methodology. The language of mathematical statistics is required to define more rigorously the appraisal of petroleum resources. Basically, two approaches have been used to appraise the amounts of undiscovered mineral resources in a geologic province: (1) projection models, which use statistical data on the past outcome of exploration and development in the province; and (2) estimation models of the overall resources of the province, which use certain known parameters of the province together with the outcome of exploration and development in analogous provinces. These two approaches often lead to widely different estimates. Some of the controversy that arises results from a confusion of the probabilistic significance of the quantities yielded by each of the two approaches. Also, inherent limitations of analytic projection models-such as those using the logistic and Gomperts functions --have often been ignored. The resource-assessment problem should be recast in terms that provide for consideration of the probability of existence of the resource and of the probability of discovery of a deposit. Then the two above-mentioned models occupy the two ends of the probability range. The new approach accounts for (1) what can be expected with reasonably high certainty by mere projections of what has been accomplished in the past; (2) the inherent biases of decision-makers and resource estimators; (3) upper bounds that can be set up as goals for exploration; and (4) the uncertainties in geologic conditions in a search for minerals. Actual outcomes can then

  14. Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences

    PubMed Central

    Stephens, Susie M.; Chen, Jake Y.; Davidson, Marcel G.; Thomas, Shiby; Trute, Barry M.

    2005-01-01

    As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html PMID:15608287

  15. Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences.

    PubMed

    Stephens, Susie M; Chen, Jake Y; Davidson, Marcel G; Thomas, Shiby; Trute, Barry M

    2005-01-01

    As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html.

  16. SPLICE: A program to assemble partial query solutions from three-dimensional database searches into novel ligands

    NASA Astrophysics Data System (ADS)

    Ho, Chris M. W.; Marshall, Garland R.

    1993-12-01

    SPLICE is a program that processes partial query solutions retrieved from 3D, structural databases to generate novel, aggregate ligands. It is designed to interface with the database searching program FOUNDATION, which retrieves fragments containing any combination of a user-specified minimum number of matching query elements. SPLICE eliminates aspects of structures that are physically incapable of binding within the active site. Then, a systematic rule-based procedure is performed upon the remaining fragments to ensure receptor complementarity. All modifications are automated and remain transparent to the user. Ligands are then assembled by linking components into composite structures through overlapping bonds. As a control experiment, FOUNDATION and SPLICE were used to reconstruct a know HIV-1 protease inhibitor after it had been fragmented, reoriented, and added to a sham database of fifty different small molecules. To illustrate the capabilities of this program, a 3D search query containing the pharmacophoric elements of an aspartic proteinase-inhibitor crystal complex was searched using FOUNDATION against a subset of the Cambridge Structural Database. One hundred thirty-one compounds were retrieved, each containing any combination of at least four query elements. Compounds were automatically screened and edited for receptor complementarity. Numerous combinations of fragments were discovered that could be linked to form novel structures, containing a greater number of pharmacophoric elements than any single retrieved fragment.

  17. Search and Study of UV-Excess Objects in the DFBS Database

    NASA Astrophysics Data System (ADS)

    Sinamyan, Parandzem K.; Sargsyan, Lusine A.; Mickaelian, Areg M.; Massaro, Enrico; Nesci, Roberto; Rossi, Corinne; Gaudenzi, Silvia; Cirimele, Giuseppe

    2007-08-01

    DFBS is a digitized version of the famous Markarian survey (or the First Byurakan Survey, FBS). The project has been carried out by teams from Byurakan, Rome and Cornell, using an EPSON Expression 1680 Pro scanner. The DFBS will serve as a unique spectroscopic database for studies in large areas (total surface is 17,000 sq. degrees) at high galactic latitudes, approximate classification for objects (20,000,000 objects are present), selection of samples of objects for definite studies (UV-excess objects, extremely red objects, variables, etc.). A joint usage of the direct images and spectra give larger possibilities for various studies and more efficient use of the survey. Using the dedicated BSpec software written by one of the authors (GC), we have obtained a list of DFBS stars, their positions, B and R magnitudes, and preliminarily classification for DFBS zones with central DEC=+39° and DEC=+43°. The spectral length l>90pix (compared to the total length 10^7pix) was used as a criterion to search for UV-excess objects, as this corresponds to the criteria used during the 2nd part of the FBS. However, the spectra of objects with B<13 always occupy the full length, and they were excluded from the lists. On the other hand, for the fainter objects (near the plate limit), we weaken the criteria of selection (l>80pix), as their spectra are shorter. An additional point for the UV excess object classification is the following: the spectra of the UV-excess objects are divided into two parts by a sensitivity gap in green; the red-yellow part of the spectra must be weaker and the blue-ultraviolet part must be brighter and more extended. We started the project in the DFBS zone +39° and +43° to compare the results with those obtained before during the 2nd part of the FBS. Later on, cross-correlations with available catalogs and a multi-wavelength analysis was made for the found objects. The preliminary results of the search and studies will be reported.

  18. Validation of SmartRank: A likelihood ratio software for searching national DNA databases with complex DNA profiles.

    PubMed

    Benschop, Corina C G; van de Merwe, Linda; de Jong, Jeroen; Vanvooren, Vanessa; Kempenaers, Morgane; Kees van der Beek, C P; Barni, Filippo; Reyes, Eusebio López; Moulin, Léa; Pene, Laurent; Haned, Hinda; Sijen, Titia

    2017-07-01

    Searching a national DNA database with complex and incomplete profiles usually yields very large numbers of possible matches that can present many candidate suspects to be further investigated by the forensic scientist and/or police. Current practice in most forensic laboratories consists of ordering these 'hits' based on the number of matching alleles with the searched profile. Thus, candidate profiles that share the same number of matching alleles are not differentiated and due to the lack of other ranking criteria for the candidate list it may be difficult to discern a true match from the false positives or notice that all candidates are in fact false positives. SmartRank was developed to put forward only relevant candidates and rank them accordingly. The SmartRank software computes a likelihood ratio (LR) for the searched profile and each profile in the DNA database and ranks database entries above a defined LR threshold according to the calculated LR. In this study, we examined for mixed DNA profiles of variable complexity whether the true donors are retrieved, what the number of false positives above an LR threshold is and the ranking position of the true donors. Using 343 mixed DNA profiles over 750 SmartRank searches were performed. In addition, the performance of SmartRank and CODIS were compared regarding DNA database searches and SmartRank was found complementary to CODIS. We also describe the applicable domain of SmartRank and provide guidelines. The SmartRank software is open-source and freely available. Using the best practice guidelines, SmartRank enables obtaining investigative leads in criminal cases lacking a suspect. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Preparing College Students To Search Full-Text Databases: Is Instruction Necessary?

    ERIC Educational Resources Information Center

    Riley, Cheryl; Wales, Barbara

    Full-text databases allow Central Missouri State University's clients to access some of the serials that libraries have had to cancel due to escalating subscription costs; EbscoHost, the subject of this study, is one such database. The database is available free to all Missouri residents. A survey was designed consisting of 21 questions intended…

  20. Novel DOCK clique driven 3D similarity database search tools for molecule shape matching and beyond: adding flexibility to the search for ligand kin.

    PubMed

    Good, Andrew C

    2007-10-01

    With readily available CPU power and copious disk storage, it is now possible to undertake rapid comparison of 3D properties derived from explicit ligand overlay experiments. With this in mind, shape software tools originally devised in the 1990s are revisited, modified and applied to the problem of ligand database shape comparison. The utility of Connolly surface data is highlighted using the program MAKESITE, which leverages surface normal data to a create ligand shape cast. This cast is applied directly within DOCK, allowing the program to be used unmodified as a shape searching tool. In addition, DOCK has undergone multiple modifications to create a dedicated ligand shape comparison tool KIN. Scoring has been altered to incorporate the original incarnation of Gaussian function derived shape description based on STO-3G atomic electron density. In addition, a tabu-like search refinement has been added to increase search speed by removing redundant starting orientations produced during clique matching. The ability to use exclusion regions, again based on Gaussian shape overlap, has also been integrated into the scoring function. The use of both DOCK with MAKESITE and KIN in database screening mode is illustrated using a published ligand shape virtual screening template. The advantages of using a clique-driven search paradigm are highlighted, including shape optimization within a pharmacophore constrained framework, and easy incorporation of additional scoring function modifications. The potential for further development of such methods is also discussed.

  1. Testing search strategies for systematic reviews in the Medline literature database through PubMed.

    PubMed

    Volpato, Enilze S N; Betini, Marluci; El Dib, Regina

    2014-04-01

    A high-quality electronic search is essential in ensuring accuracy and completeness in retrieved records for the conducting of a systematic review. We analysed the available sample of search strategies to identify the best method for searching in Medline through PubMed, considering the use or not of parenthesis, double quotation marks, truncation and use of a simple search or search history. In our cross-sectional study of search strategies, we selected and analysed the available searches performed during evidence-based medicine classes and in systematic reviews conducted in the Botucatu Medical School, UNESP, Brazil. We analysed 120 search strategies. With regard to the use of phrase searches with parenthesis, there was no difference between the results with and without parenthesis and simple searches or search history tools in 100% of the sample analysed (P = 1.0). The number of results retrieved by the searches analysed was smaller using double quotations marks and using truncation compared with the standard strategy (P = 0.04 and P = 0.08, respectively). There is no need to use phrase-searching parenthesis to retrieve studies; however, we recommend the use of double quotation marks when an investigator attempts to retrieve articles in which a term appears to be exactly the same as what was proposed in the search form. Furthermore, we do not recommend the use of truncation in search strategies in the Medline via PubMed. Although the results of simple searches or search history tools were the same, we recommend using the latter.

  2. Serum lipid profile and risk of prostate cancer recurrence: results from the SEARCH database

    PubMed Central

    Allott, Emma H.; Howard, Lauren E.; Cooperberg, Matthew R.; Kane, Christopher J.; Aronson, William J.; Terris, Martha K.; Amling, Christopher L.; Freedland, Stephen J.

    2014-01-01

    Background Evidence for an association between total cholesterol, low and high density lipoproteins (LDL and HDL, respectively), triglycerides and prostate cancer (PC) is conflicting. Given that PC and dyslipidemia affect large proportions of Western society, understanding these associations has public health importance. Methods We conducted a retrospective cohort analysis of 843 radical prostatectomy (RP) patients who never used statins before surgery within the Shared Equal Access Regional Cancer Hospital (SEARCH) database. Multivariable Cox proportional hazards analysis was used to investigate the association between cholesterol, LDL, HDL and triglycerides and biochemical recurrence risk. In secondary analysis, we explored these associations in patients with dyslipidemia, defined using National Cholesterol Education Program guidelines. Results Elevated serum triglycerides were associated with increased risk of PC recurrence (HRper 10 mg/dl 1.03; 95%CI 1.01–1.05) but associations between total cholesterol, LDL and HDL and recurrence risk were null. However, among men with dyslipidemia, each 10 mg/dl increase in cholesterol and HDL was associated with 9% increased recurrence risk (HR 1.09; 95%CI 1.01–1.17) and 39% reduced recurrence risk (HR 0.61; 95%CI 0.41–0.91), respectively. Conclusions Elevated serum triglycerides were associated with increased risk of PC recurrence. Cholesterol, LDL or HDL were not associated with recurrence risk among all men. However, among men with dyslipidemia, elevated cholesterol and HDL levels were associated with increased and decreased risk of recurrence, respectively. Impact These findings, coupled with evidence that statin use is associated with reduced recurrence risk, suggest that lipid levels should be explored as a modifiable risk factor for PC recurrence. PMID:25304929

  3. Postoperative statin use and risk of biochemical recurrence following radical prostatectomy: Results from the SEARCH database

    PubMed Central

    Allott, Emma H.; Howard, Lauren E.; Cooperberg, Matthew R.; Kane, Christopher J.; Aronson, William J.; Terris, Martha K.; Amling, Christopher L.; Freedland, Stephen J.

    2014-01-01

    Objective • To investigate the effect of postoperative statin use on biochemical recurrence (BCR) in PC patients treated with radical prostatectomy (RP) who never used statins before surgery. Patients and Methods • We conducted a retrospective analysis of 1,146 RP patients within the Shared Equal Access Regional Cancer Hospital (SEARCH) database. • Multivariable Cox proportional hazards analyses were used to examine differences in risk of BCR between postoperative statin users versus nonusers. • To account for varying start dates and duration of statin use during follow-up, postoperative statin use was treated as a time-dependent variable. • In secondary analysis, models were stratified by race to examine the association of postoperative statin use with BCR among black and non-black men. Results • After adjusting for clinical and pathological characteristics, postoperative statin use was significantly associated with 36% reduced risk of BCR (HR 0.64; 95%CI 0.47-0.87; p=0.004). • Postoperative statin use remained associated with reduced risk of BCR after adjusting for preoperative serum cholesterol levels. • In secondary analysis, following stratification by race, this protective association was significant in non-black (HR 0.49; 95%CI 0.32-0.75; p=0.001) but not black men (HR 0.82; 95%CI 0.53-1.28; p=0.384). Conclusion • In this retrospective cohort of men undergoing RP, postoperative statin use was significantly associated with reduced risk of BCR. • Whether the association between postoperative statin use and BCR differs by race requires further study. • Given these findings, coupled with other studies suggesting that statins may reduce risk of advanced PC, randomized controlled trials are warranted to formally test the hypothesis that statins slow PC progression. PMID:24588774

  4. Searching for first-degree familial relationships in California's offender DNA database: validation of a likelihood ratio-based approach.

    PubMed

    Myers, Steven P; Timken, Mark D; Piucci, Matthew L; Sims, Gary A; Greenwald, Michael A; Weigand, James J; Konzak, Kenneth C; Buoncristiani, Martin R

    2011-11-01

    A validation study was performed to measure the effectiveness of using a likelihood ratio-based approach to search for possible first-degree familial relationships (full-sibling and parent-child) by comparing an evidence autosomal short tandem repeat (STR) profile to California's ∼1,000,000-profile State DNA Index System (SDIS) database. Test searches used autosomal STR and Y-STR profiles generated for 100 artificial test families. When the test sample and the first-degree relative in the database were characterized at the 15 Identifiler(®) (Applied Biosystems(®), Foster City, CA) STR loci, the search procedure included 96% of the fathers and 72% of the full-siblings. When the relative profile was limited to the 13 Combined DNA Index System (CODIS) core loci, the search procedure included 93% of the fathers and 61% of the full-siblings. These results, combined with those of functional tests using three real families, support the effectiveness of this tool. Based upon these results, the validated approach was implemented as a key, pragmatic and demonstrably practical component of the California Department of Justice's Familial Search Program. An investigative lead created through this process recently led to an arrest in the Los Angeles Grim Sleeper serial murders.

  5. An impatient evolutionary algorithm with probabilistic tabu search for unified solution of some NP-hard problems in graph and set theory via clique finding.

    PubMed

    Guturu, Parthasarathy; Dantu, Ram

    2008-06-01

    Many graph- and set-theoretic problems, because of their tremendous application potential and theoretical appeal, have been well investigated by the researchers in complexity theory and were found to be NP-hard. Since the combinatorial complexity of these problems does not permit exhaustive searches for optimal solutions, only near-optimal solutions can be explored using either various problem-specific heuristic strategies or metaheuristic global-optimization methods, such as simulated annealing, genetic algorithms, etc. In this paper, we propose a unified evolutionary algorithm (EA) to the problems of maximum clique finding, maximum independent set, minimum vertex cover, subgraph and double subgraph isomorphism, set packing, set partitioning, and set cover. In the proposed approach, we first map these problems onto the maximum clique-finding problem (MCP), which is later solved using an evolutionary strategy. The proposed impatient EA with probabilistic tabu search (IEA-PTS) for the MCP integrates the best features of earlier successful approaches with a number of new heuristics that we developed to yield a performance that advances the state of the art in EAs for the exploration of the maximum cliques in a graph. Results of experimentation with the 37 DIMACS benchmark graphs and comparative analyses with six state-of-the-art algorithms, including two from the smaller EA community and four from the larger metaheuristics community, indicate that the IEA-PTS outperforms the EAs with respect to a Pareto-lexicographic ranking criterion and offers competitive performance on some graph instances when individually compared to the other heuristic algorithms. It has also successfully set a new benchmark on one graph instance. On another benchmark suite called Benchmarks with Hidden Optimal Solutions, IEA-PTS ranks second, after a very recent algorithm called COVER, among its peers that have experimented with this suite.

  6. Probabilistic Risk Assessment: A Bibliography

    NASA Technical Reports Server (NTRS)

    2000-01-01

    Probabilistic risk analysis is an integration of failure modes and effects analysis (FMEA), fault tree analysis and other techniques to assess the potential for failure and to find ways to reduce risk. This bibliography references 160 documents in the NASA STI Database that contain the major concepts, probabilistic risk assessment, risk and probability theory, in the basic index or major subject terms, An abstract is included with most citations, followed by the applicable subject terms.

  7. Comparative Recall and Precision of Simple and Expert Searches in Google Scholar and Eight Other Databases

    ERIC Educational Resources Information Center

    Walters, William H.

    2011-01-01

    This study evaluates the effectiveness of simple and expert searches in Google Scholar (GS), EconLit, GEOBASE, PAIS, POPLINE, PubMed, Social Sciences Citation Index, Social Sciences Full Text, and Sociological Abstracts. It assesses the recall and precision of 32 searches in the field of later-life migration: nine simple keyword searches and 23…

  8. Millennial Students' Mental Models of Search: Implications for Academic Librarians and Database Developers

    ERIC Educational Resources Information Center

    Holman, Lucy

    2011-01-01

    Today's students exhibit generational differences in the way they search for information. Observations of first-year students revealed a proclivity for simple keyword or phrases searches with frequent misspellings and incorrect logic. Although no students had strong mental models of search mechanisms, those with stronger models did construct more…

  9. Millennial Students' Mental Models of Search: Implications for Academic Librarians and Database Developers

    ERIC Educational Resources Information Center

    Holman, Lucy

    2011-01-01

    Today's students exhibit generational differences in the way they search for information. Observations of first-year students revealed a proclivity for simple keyword or phrases searches with frequent misspellings and incorrect logic. Although no students had strong mental models of search mechanisms, those with stronger models did construct more…

  10. Comparative Recall and Precision of Simple and Expert Searches in Google Scholar and Eight Other Databases

    ERIC Educational Resources Information Center

    Walters, William H.

    2011-01-01

    This study evaluates the effectiveness of simple and expert searches in Google Scholar (GS), EconLit, GEOBASE, PAIS, POPLINE, PubMed, Social Sciences Citation Index, Social Sciences Full Text, and Sociological Abstracts. It assesses the recall and precision of 32 searches in the field of later-life migration: nine simple keyword searches and 23…

  11. HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks

    PubMed Central

    Dai, Xinbin; Li, Jun; Liu, Tingsong; Zhao, Patrick Xuechun

    2016-01-01

    The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many ‘unknown’ yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/. PMID:26657893

  12. A continued search for transient events in the COBE DMR database simultaneous with cosmic gamma-ray bursts

    NASA Astrophysics Data System (ADS)

    Stacy, J. Gregory; Jackson, Peter D.; Bontekoe, Tj. Romke; Winkler, Christoph

    1996-08-01

    We report on the status of our ongoing project to search the database of the COBE Differential Microwave Radiometer (DMR) experiment for transient signals at microwave wavelengths simultaneous with cosmic gamma-ray bursts (GRBs). To date we have carried out a complete search of the DMR database using burst positions taken from the original BATSE 1B catalog for the eight-month period of overlap (May-December 1991) corresponding to the first public release of COBE data. We are currently repeating our original search of the COBE DMR database using the revised burst positions of the newly-released BATSE 3B catalog. Using BATSE 1B positions, at least two apparent simultaneous observations of GRBs by the COBE DMR occurred in 1991, along with a number of ``near misses'' within 30 seconds in time. At present, only upper limits to burst microwave emission are indicated. Even in the event of a non-detection of a GRB by the COBE DMR, unprecedented observational limits will still be obtained, constraining the predictions of the many theoretical models proposed to explain the origin of GRBs.

  13. Feasibility of LC/TOFMS and elemental database searching as a spectral library for pesticides in food.

    PubMed

    Thurman, E Michael; Ferrer, Imma; Malato, Octavio; Fernández-Alba, Amadeo Rodriguez

    2006-11-01

    Traditionally, the screening of unknown pesticides in food has been accomplished by GC/MS methods using conventional library-searching routines. However, many of the new polar and thermally labile pesticides are more readily and easily analysed by LC/MS methods and no searchable libraries currently exist (with the exception of some user libraries, which are limited). Therefore, there is a need for LC/MS libraries that can detect pesticides and their degradation products. This paper reports an identification scheme using a combination of LC/MS time-of-flight (accurate mass) and an Access database of 350 pesticides that are amenable to positive ion electrospray. The approach differs from conventional library searching of fragment ions. The concept consists of three parts: (1) initial screening of possible pesticides in actual market-place fruit extracts (apple and orange) using accurate mass and generating an accurate mass via an automatic ion-extraction routine, (2) searching the Access database manually for screening identification of a pesticide, and (3) identification of the suspected compound by accurate mass of at least one fragment ion and comparison of retention time with an actual standard. Imazalil and iprodione were identified in apples and thiabendazole in oranges using this database approach.

  14. Digital cloning: identification of human cDNAs homologous to novel kinases through expressed sequence tag database searching.

    PubMed

    Chen, H C; Kung, H J; Robinson, D

    1998-01-01

    Identification of novel kinases based on their sequence conservation within kinase catalytic domain has relied so far on two major approaches, low-stringency hybridization of cDNA libraries, and PCR method using degenerate primers. Both of these approaches at times are technically difficult and time-consuming. We have developed a procedure that can significantly reduce the time and effort involved in searching for novel kinases and increase the sensitivity of the analysis. This procedure exploits the computer analysis of a vast resource of human cDNA sequences represented in the expressed sequence tag (EST) database. Seventeen novel human cDNA clones showing significant homology to serine/threonine kinases, including STE-20, CDK- and YAK-related family kinases, were identified by searching EST database. Further sequence analysis of these novel kinases obtained either directly from EST clones or from PCR-RACE products confirmed their identity as protein kinases. Given the rapid accumulation of the EST database and the advent of powerful computer analysis software, this approach provides a fast, sensitive, and economical way to identify novel kinases as well as other genes from EST database.

  15. Metformin does not affect risk of biochemical recurrence following radical prostatectomy: results from the SEARCH database

    PubMed Central

    Allott, Emma H.; Abern, Michael R.; Gerber, Leah; Keto, Christopher J.; Aronson, William J.; Terris, Martha K.; Kane, Christopher J.; Amling, Christopher L.; Cooperberg, Matthew R.; Moorman, Patricia G.; Freedland, Stephen J.

    2013-01-01

    Background While epidemiologic studies suggest that metformin use among diabetics may decrease prostate cancer (PC) incidence, the effect of metformin use on PC outcome is unclear. We investigated the association between pre-operative metformin use, dose and duration of use and biochemical recurrence (BCR) in PC patients with diabetes who underwent radical prostatectomy (RP). Methods We conducted a retrospective cohort analysis within the Shared Equal Access Regional Cancer Hospital (SEARCH) database of 371 PC patients with diabetes who underwent RP. Time to BCR between metformin users and non-users, and by metformin dose and duration of use was assessed using multivariable Cox proportional analysis adjusted for demographic, clinical and/or pathologic features. Time to castrate-resistant PC (CPRC), metastases and PC-specific mortality were explored as secondary outcomes using unadjusted analyses. Results Of 371 diabetic men, 156 (42%) were using metformin prior to RP. Metformin use was associated with more recent year of surgery (p<0.0001) but no clinical or pathologic characteristics. After adjustment for year of surgery, clinical and pathologic features, there were no associations between metformin use (HR 0.93; 95%CI 0.61–1.41), high metformin dose (HR 0.96; 95%CI 0.57–1.61) or duration of use (HR 1.00; 95%CI 0.99–1.02) and time to BCR. A total of 14 patients (3.8%) developed CRPC, 10 (2.7%) distant metastases and 8 (2.2%) died from PC. Unadjusted analysis suggested high metformin dose versus non-use was associated with increased risk of CRPC (HR 5.1; 95%CI 1.6–16.5), metastases (HR 4.8; 95%CI 1.2–18.5) and PC-specific mortality (HR 5.0; 95%CI 1.1–22.5). Conclusions Metformin use, dose or duration of use was not associated with BCR in this cohort of diabetic PC patients treated with RP. The suggestion that higher metformin dose was associated with increased risk of CPRC, metastases and PC-specific mortality merits testing in large prospective studies

  16. Effect of cleavage enzyme, search algorithm and decoy database on mass spectrometric identification of wheat gluten proteins.

    PubMed

    Vensel, William H; Dupont, Frances M; Sloane, Stacia; Altenbach, Susan B

    2011-07-01

    While tandem mass spectrometry (MS/MS) is routinely used to identify proteins from complex mixtures, certain types of proteins present unique challenges for MS/MS analyses. The major wheat gluten proteins, gliadins and glutenins, are particularly difficult to distinguish by MS/MS. Each of these groups contains many individual proteins with similar sequences that include repetitive motifs rich in proline and glutamine. These proteins have few cleavable tryptic sites, often resulting in only one or two tryptic peptides that may not provide sufficient information for identification. Additionally, there are less than 14,000 complete protein sequences from wheat in the current NCBInr release. In this paper, MS/MS methods were optimized for the identification of the wheat gluten proteins. Chymotrypsin and thermolysin as well as trypsin were used to digest the proteins and the collision energy was adjusted to improve fragmentation of chymotryptic and thermolytic peptides. Specialized databases were constructed that included protein sequences derived from contigs from several assemblies of wheat expressed sequence tags (ESTs), including contigs assembled from ESTs of the cultivar under study. Two different search algorithms were used to interrogate the database and the results were analyzed and displayed using a commercially available software package (Scaffold). We examined the effect of protein database content and size on the false discovery rate. We found that as database size increased above 30,000 sequences there was a decrease in the number of proteins identified. Also, the type of decoy database influenced the number of proteins identified. Using three enzymes, two search algorithms and a specialized database allowed us to greatly increase the number of detected peptides and distinguish proteins within each gluten protein group.

  17. Param-Medic: A Tool for Improving MS/MS Database Search Yield by Optimizing Parameter Settings.

    PubMed

    May, Damon H; Tamura, Kaipo; Noble, William S

    2017-03-13

    In shotgun proteomics analysis, user-specified parameters are critical to database search performance and therefore to the yield of confident peptide-spectrum matches (PSMs). Two of the most important parameters are related to the accuracy of the mass spectrometer. Precursor mass tolerance defines the peptide candidates considered for each spectrum. Fragment mass tolerance or bin size determines how close observed and theoretical fragments must be to be considered a match. For either of these two parameters, too wide a setting yields randomly high-scoring false PSMs, whereas too narrow a setting erroneously excludes true PSMs, in both cases, lowering the yield of peptides detected at a given false discovery rate. We describe a strategy for inferring optimal search parameters by assembling and analyzing pairs of spectra that are likely to have been generated by the same peptide ion to infer precursor and fragment mass error. This strategy does not rely on a database search, making it usable in a wide variety of settings. In our experiments on data from a variety of instruments including Orbitrap and Q-TOF acquisitions, this strategy yields more high-confidence PSMs than using settings based on instrument defaults or determined by experts. Param-Medic is open-source and cross-platform. It is available as a standalone tool ( http://noble.gs.washington.edu/proj/param-medic/ ) and has been integrated into the Crux proteomics toolkit ( http://crux.ms ), providing automatic parameter selection for the Comet and Tide search engines.

  18. GHOSTX: an improved sequence homology search algorithm using a query suffix array and a database suffix array.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2014-01-01

    DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131-165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.

  19. A Multivariate Mixture Model to Estimate the Accuracy of Glycosaminoglycan Identifications Made by Tandem Mass Spectrometry (MS/MS) and Database Search.

    PubMed

    Chiu, Yulun; Schliekelman, Paul; Orlando, Ron; Sharp, Joshua S

    2017-02-01

    We present a statistical model to estimate the accuracy of derivatized heparin and heparan sulfate (HS) glycosaminoglycan (GAG) assignments to tandem mass (MS/MS) spectra made by the first published database search application, GAG-ID. Employing a multivariate expectation-maximization algorithm, this statistical model distinguishes correct from ambiguous and incorrect database search results when computing the probability that heparin/HS GAG assignments to spectra are correct based upon database search scores. Using GAG-ID search results for spectra generated from a defined mixture of 21 synthesized tetrasaccharide sequences as well as seven spectra of longer defined oligosaccharides, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly, ambiguously, and incorrectly assigned heparin/HS GAGs. This analysis makes it possible to filter large MS/MS database search results with predictable false identification error rates.

  20. Review and Comparison of the Search Effectiveness and User Interface of Three Major Online Chemical Databases

    ERIC Educational Resources Information Center

    Bharti, Neelam; Leonard, Michelle; Singh, Shailendra

    2016-01-01

    Online chemical databases are the largest source of chemical information and, therefore, the main resource for retrieving results from published journals, books, patents, conference abstracts, and other relevant sources. Various commercial, as well as free, chemical databases are available. SciFinder, Reaxys, and Web of Science are three major…

  1. Review and Comparison of the Search Effectiveness and User Interface of Three Major Online Chemical Databases

    ERIC Educational Resources Information Center

    Bharti, Neelam; Leonard, Michelle; Singh, Shailendra

    2016-01-01

    Online chemical databases are the largest source of chemical information and, therefore, the main resource for retrieving results from published journals, books, patents, conference abstracts, and other relevant sources. Various commercial, as well as free, chemical databases are available. SciFinder, Reaxys, and Web of Science are three major…

  2. Reach for Reference. Don't Judge a Database by Its Search Screen

    ERIC Educational Resources Information Center

    Safford, Barbara Ripp

    2005-01-01

    In this column, the author provides a description and brief review of the "Children's Literature Comprehensive Database" (CLCD). This subscription database is a 1999 spinoff from Marilyn Courtot's "Children's Literature" website, which began in 1993 and is a free resource of reviews and features about books, authors, and illustrators. The separate…

  3. Low template STR typing: effect of replicate number and consensus method on genotyping reliability and DNA database search results.

    PubMed

    Benschop, Corina C G; van der Beek, Cornelis P; Meiland, Hugo C; van Gorp, Ankie G M; Westen, Antoinette A; Sijen, Titia

    2011-08-01

    To analyze DNA samples with very low DNA concentrations, various methods have been developed that sensitize short tandem repeat (STR) typing. Sensitized DNA typing is accompanied by stochastic amplification effects, such as allele drop-outs and drop-ins. Therefore low template (LT) DNA profiles are interpreted with care. One can either try to infer the genotype by a consensus method that uses alleles confirmed in replicate analyses, or one can use a statistical model to evaluate the strength of the evidence in a direct comparison with a known DNA profile. In this study we focused on the first strategy and we show that the procedure by which the consensus profile is assembled will affect genotyping reliability. In order to gain insight in the roles of replicate number and requested level of reproducibility, we generated six independent amplifications of samples of known donors. The LT methods included both increased cycling and enhanced capillary electrophoresis (CE) injection [1]. Consensus profiles were assembled from two to six of the replications using four methods: composite (include all alleles), n-1 (include alleles detected in all but one replicate), n/2 (include alleles detected in at least half of the replicates) and 2× (include alleles detected twice). We compared the consensus DNA profiles with the DNA profile of the known donor, studied the stochastic amplification effects and examined the effect of the consensus procedure on DNA database search results. From all these analyses we conclude that the accuracy of LT DNA typing and the efficiency of database searching improve when the number of replicates is increased and the consensus method is n/2. The most functional number of replicates within this n/2 method is four (although a replicate number of three suffices for samples showing >25% of the alleles in standard STR typing). This approach was also the optimal strategy for the analysis of 2-person mixtures, although modified search strategies may be

  4. Boolean Logic: An Aid for Searching Computer Databases in Special Education and Rehabilitation.

    ERIC Educational Resources Information Center

    Summers, Edward G.

    1989-01-01

    The article discusses using Boolean logic as a tool for searching computerized information retrieval systems in special education and rehabilitation technology. It includes discussion of the Boolean search operators AND, OR, and NOT; Venn diagrams; and disambiguating parentheses. Six suggestions are offered for development of good Boolean logic…

  5. Closing the loop in cortically-coupled computer vision: a brain-computer interface for searching image databases.

    PubMed

    Pohlmeyer, Eric A; Wang, Jun; Jangraw, David C; Lou, Bin; Chang, Shih-Fu; Sajda, Paul

    2011-06-01

    We describe a closed-loop brain-computer interface that re-ranks an image database by iterating between user generated 'interest' scores and computer vision generated visual similarity measures. The interest scores are based on decoding the electroencephalographic (EEG) correlates of target detection, attentional shifts and self-monitoring processes, which result from the user paying attention to target images interspersed in rapid serial visual presentation (RSVP) sequences. The highest scored images are passed to a semi-supervised computer vision system that reorganizes the image database accordingly, using a graph-based representation that captures visual similarity between images. The system can either query the user for more information, by adaptively resampling the database to create additional RSVP sequences, or it can converge to a 'done' state. The done state includes a final ranking of the image database and also a 'guess' of the user's chosen category of interest. We find that the closed-loop system's re-rankings can substantially expedite database searches for target image categories chosen by the subjects. Furthermore, better reorganizations are achieved than by relying on EEG interest rankings alone, or if the system were simply run in an open loop format without adaptive resampling.

  6. Lead generation using pharmacophore mapping and three-dimensional database searching: application to muscarinic M(3) receptor antagonists.

    PubMed

    Marriott, D P; Dougall, I G; Meghani, P; Liu, Y J; Flower, D R

    1999-08-26

    By using a pharmacophore model, a geometrical representation of the features necessary for molecules to show a particular biological activity, it is possible to search databases containing the 3D structures of molecules and identify novel compounds which may possess this activity. We describe our experiences of establishing a working 3D database system and its use in rational drug design. By using muscarinic M(3) receptor antagonists as an example, we show that it is possible to identify potent novel lead compounds using this approach. Pharmacophore generation based on the structures of known M(3) receptor antagonists, 3D database searching, and medium-throughput screening were used to identify candidate compounds. Three compounds were chosen to define the pharmacophore: a lung-selective M(3) antagonist patented by Pfizer and two Astra compounds which show affinity at the M(3) receptor. From these, a pharmacophore model was generated, using the program DISCO, and this was used subsequently to search a UNITY 3D database of proprietary compounds; 172 compounds were found to fit the pharmacophore. These compounds were then screened, and 1-[2-(2-(diethylamino)ethoxy)phenyl]-2-phenylethanone (pA(2) 6.67) was identified as the best hit, with N-[2-(piperidin-1-ylmethyl)cycohexyl]-2-propoxybenz amide (pA(2) 4. 83) and phenylcarbamic acid 2-(morpholin-4-ylmethyl)cyclohexyl ester (pA(2) 5.54) demonstrating lower activity. As well as its potency, 1-[2-(2-(diethylamino)ethoxy)phenyl]-2-phenylethanone is a simple structure with limited similarity to existing M(3) receptor antagonists.

  7. High-performance hardware implementation of a parallel database search engine for real-time peptide mass fingerprinting

    PubMed Central

    Bogdán, István A.; Rivers, Jenny; Beynon, Robert J.; Coca, Daniel

    2008-01-01

    Motivation: Peptide mass fingerprinting (PMF) is a method for protein identification in which a protein is fragmented by a defined cleavage protocol (usually proteolysis with trypsin), and the masses of these products constitute a ‘fingerprint’ that can be searched against theoretical fingerprints of all known proteins. In the first stage of PMF, the raw mass spectrometric data are processed to generate a peptide mass list. In the second stage this protein fingerprint is used to search a database of known proteins for the best protein match. Although current software solutions can typically deliver a match in a relatively short time, a system that can find a match in real time could change the way in which PMF is deployed and presented. In a paper published earlier we presented a hardware design of a raw mass spectra processor that, when implemented in Field Programmable Gate Array (FPGA) hardware, achieves almost 170-fold speed gain relative to a conventional software implementation running on a dual processor server. In this article we present a complementary hardware realization of a parallel database search engine that, when running on a Xilinx Virtex 2 FPGA at 100 MHz, delivers 1800-fold speed-up compared with an equivalent C software routine, running on a 3.06 GHz Xeon workstation. The inherent scalability of the design means that processing speed can be multiplied by deploying the design on multiple FPGAs. The database search processor and the mass spectra processor, running on a reconfigurable computing platform, provide a complete real-time PMF protein identification solution. Contact: d.coca@sheffield.ac.uk PMID:18453553

  8. Published and perished? The influence of the searched protein database on the long-term storage of proteomics data.

    PubMed

    Griss, Johannes; Côté, Richard G; Gerner, Christopher; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2011-09-01

    In proteomics, protein identifications are reported and stored using an unstable reference system: protein identifiers. These proprietary identifiers are created individually by every protein database and can change or may even be deleted over time. To estimate the effect of the searched protein sequence database on the long-term storage of proteomics data we analyzed the changes of reported protein identifiers from all public experiments in the Proteomics Identifications (PRIDE) database by November 2010. To map the submitted protein identifier to a currently active entry, two distinct approaches were used. The first approach used the Protein Identifier Cross Referencing (PICR) service at the EBI, which maps protein identifiers based on 100% sequence identity. The second one (called logical mapping algorithm) accessed the source databases and retrieved the current status of the reported identifier. Our analysis showed the differences between the main protein databases (International Protein Index (IPI), UniProt Knowledgebase (UniProtKB), National Center for Biotechnological Information nr database (NCBI nr), and Ensembl) in respect to identifier stability. For example, whereas 20% of submitted IPI entries were deleted after two years, virtually all UniProtKB entries remained either active or replaced. Furthermore, the two mapping algorithms produced markedly different results. For example, the PICR service reported 10% more IPI entries deleted compared with the logical mapping algorithm. We found several cases where experiments contained more than 10% deleted identifiers already at the time of publication. We also assessed the proportion of peptide identifications in these data sets that still fitted the originally identified protein sequences. Finally, we performed the same overall analysis on all records from IPI, Ensembl, and UniProtKB: two releases per year were used, from 2005. This analysis showed for the first time the true effect of changing protein

  9. Supervised learning of tools for content-based search of image databases

    NASA Astrophysics Data System (ADS)

    Delanoy, Richard L.

    1996-03-01

    A computer environment, called the Toolkit for Image Mining (TIM), is being developed with the goal of enabling users with diverse interests and varied computer skills to create search tools for content-based image retrieval and other pattern matching tasks. Search tools are generated using a simple paradigm of supervised learning that is based on the user pointing at mistakes of classification made by the current search tool. As mistakes are identified, a learning algorithm uses the identified mistakes to build up a model of the user's intentions, construct a new search tool, apply the search tool to a test image, display the match results as feedback to the user, and accept new inputs from the user. Search tools are constructed in the form of functional templates, which are generalized matched filters capable of knowledge- based image processing. The ability of this system to learn the user's intentions from experience contrasts with other existing approaches to content-based image retrieval that base searches on the characteristics of a single input example or on a predefined and semantically- constrained textual query. Currently, TIM is capable of learning spectral and textural patterns, but should be adaptable to the learning of shapes, as well. Possible applications of TIM include not only content-based image retrieval, but also quantitative image analysis, the generation of metadata for annotating images, data prioritization or data reduction in bandwidth-limited situations, and the construction of components for larger, more complex computer vision algorithms.

  10. External validation of the SEARCH model for predicting aggressive recurrence after radical prostatectomy: results from the Duke Prostate Center Database

    PubMed Central

    Teeter, Anna E.; Sun, Leon; Moul, Judd W.; Freedland, Stephen J.

    2010-01-01

    Objective To validate a model previously developed using the Shared Equal Access Regional Cancer Hospital (SEARCH) database to predict the risk of aggressive recurrence after surgery, defined as a prostate-specific antigen (PSA) doubling time (DT) of < 9 months, incorporating pathological stage, preoperative PSA level and pathological Gleason sum, that had an area under the curve (AUC) of 0.79 using a cohort of men from the Duke Prostate Center (DPC). Patients and methods Data were included from 1989 men from the DPC database who underwent RP for node-negative prostate cancer between 1987 and 2003. Of these men, 100 had disease recurrence, with a PSADT of < 9 months, while 1889 either did not have a recurrence but had ≥36 months of follow-up or had a recurrence with a PSADT of ≥ 9 months. We examined the ability of the SEARCH model to predict aggressive recurrence within the DPC cohort, and examined the correlation between the predicted risk of aggressive recurrence and the actual outcome within DPC. Results The SEARCH model predicted aggressive recurrence within DPC with an AUC of 0.82. There was a strong and significant correlation between the predicted risk of aggressive recurrence based on the SEARCH tables and the actual outcomes within DPC (r = 0.68, P < 0.001), although the model predictions tended to be slightly higher than the actual risk. Conclusions The SEARCH model to predict aggressive recurrence after RP predicted aggressive recurrence in an external dataset with a high degree of accuracy. These tables, now validated, can be used to help select men for adjuvant therapy and clinical trials. PMID:20151967

  11. 3DinSight: an integrated relational database and search tool for the structure, function and properties of biomolecules.

    PubMed

    An, J; Nakama, T; Kubota, Y; Sarai, A

    1998-01-01

    Although a large amount of information on the structure, function and properties of biomolecules is becoming available, it is difficult to understand the relationship between them. Thus, we have attempted to create an integrated relational database, search and visualization tool, 3DinSight, to help researchers to gain insight into their relationship. We have gathered data on the structure, function and properties of biomolecules, and implemented them into a relational database system. The structural data contain several subset data such as protein homologues, protein-DNA complex, in order to enable searching within a specific class of data. The functional data include motif sequence and mutation data of proteins. Also, various amino acid properties are implemented as a relational table. The World Wide Web (WWW) interfaces enable users to carry out various kinds of searches among these data. The locations of motif sequences and mutations are automatically mapped on the structure, and visualized in three-dimensional (3D) space by interactive viewers, VRML (Virtual Reality Modeling Language) and RasMol. In the case of VRML, the mapped 3D objects are hyper-linked to the corresponding document data. Also, amino acid properties, linked with structure, functional and mutation sites, can be displayed as graph plots. 3DinSight is freely accessible through the Internet (http://www.rtc.riken.go.jp/3DinSight.h tml). sarai@rtc.riken.go.jp

  12. Federated Search Tools in Fusion Centers: Bridging Databases in the Information Sharing Environment

    DTIC Science & Technology

    2012-09-01

    German and Jay Stanley , “Fusion Center Update,” American Civil Liberties Union, July 2008, http://www.aclu.org/files/pdfs/privacy...Intelligence, ed. Jennifer E. Sims and Burton Gerber (Washington DC: Georgetown University Press, 2005), 107. 16 Ibid. 11 through a federated search tool...SurveyMonkey. Last modified June 23, 2012. https://www.surveymonkey.com/s/FederatedSearchToolsinFCs. German, Mike and Jay Stanley . “Fusion Center

  13. The Magnetics Information Consortium (MagIC) Online Database: Uploading, Searching and Visualizing Paleomagnetic and Rock Magnetic Data

    NASA Astrophysics Data System (ADS)

    Koppers, A.; Tauxe, L.; Constable, C.; Pisarevsky, S.; Jackson, M.; Solheid, P.; Banerjee, S.; Johnson, C.; Genevey, A.; Delaney, R.; Baker, P.; Sbarbori, E.

    2005-12-01

    The Magnetics Information Consortium (MagIC) operates an online relational database including both rock and paleomagnetic data. The goal of MagIC is to store all measurements and their derived properties for studies of paleomagnetic directions (inclination, declination) and their intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and has two search nodes, one for paleomagnetism and one for rock magnetism. These nodes provide basic search capabilities based on location, reference, methods applied, material type and geological age, while allowing the user to drill down from sites all the way to the measurements. At each stage, the data can be saved and, if the available data supports it, the data can be visualized by plotting equal area plots, VGP location maps or typical Zijderveld, hysteresis, FORC, and various magnetization and remanence diagrams. All plots are made in SVG (scalable vector graphics) and thus can be saved and easily read into the user's favorite graphics programs without loss of resolution. User contributions to the MagIC database are critical to achieve a useful research tool. We have developed a standard data and metadata template (version 1.6) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate easy population of these templates within Microsoft Excel. These tools allow for the import/export of text files and they provide advanced functionality to manage/edit the data, and to perform various internal checks to high grade the data and to make them ready for uploading. The uploading is all done online by using the MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm that takes only a few minutes to process a contribution of approximately 5,000 data records. After uploading these standardized MagIC template files will be stored in the

  14. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.

    PubMed

    Liu, Yongchao; Wirawan, Adrianto; Schmidt, Bertil

    2013-04-04

    The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. We present CUDASW++ 3.0, a fast Smith-Waterman protein database search algorithm, which couples CPU and GPU SIMD instructions and carries out concurrent CPU and GPU computations. For the CPU computation, this algorithm employs SSE-based vector execution units as accelerators. For the GPU computation, we have investigated for the first time a GPU SIMD parallelization, which employs CUDA PTX SIMD video instructions to gain more data parallelism beyond the SIMT execution model. Moreover, sequence alignment workloads are automatically distributed over CPUs and GPUs based on their respective compute capabilities. Evaluation on the Swiss-Prot database shows that CUDASW++ 3.0 gains a performance improvement over CUDASW++ 2.0 up to 2.9 and 3.2, with a maximum performance of 119.0 and 185.6 GCUPS, on a single-GPU GeForce GTX 680 and a dual-GPU GeForce GTX 690 graphics card, respectively. In addition, our algorithm has demonstrated significant speedups over other top-performing tools: SWIPE and BLAST+. CUDASW++ 3.0 is written in CUDA C++ and PTX assembly languages, targeting GPUs based on the Kepler architecture. This algorithm obtains significant speedups over its predecessor: CUDASW++ 2.0, by benefiting from the use of CPU and GPU SIMD instructions as well as the concurrent execution on CPUs and GPUs. The source code and the simulated data are available at http://cudasw.sourceforge.net.

  15. Code optimization of the subroutine to remove near identical matches in the sequence database homology search tool PSI-BLAST.

    PubMed

    Aspnäs, Mats; Mattila, Kimmo; Osowski, Kristoffer; Westerholm, Jan

    2010-06-01

    A central task in protein sequence characterization is the use of a sequence database homology search tool to find similar protein sequences in other individuals or species. PSI-BLAST is a widely used module of the BLAST package that calculates a position-specific score matrix from the best matching sequences and performs iterated searches using a method to avoid many similar sequences for the score. For some queries and parameter settings, PSI-BLAST may find many similar high-scoring matches, and therefore up to 80% of the total run time may be spent in this procedure. In this article, we present code optimizations that improve the cache utilization and the overall performance of this procedure. Measurements show that, for queries where the number of similar matches is high, the optimized PSI-BLAST program may be as much as 2.9 times faster than the original program.

  16. Combining history of medicine and library instruction: an innovative approach to teaching database searching to medical students.

    PubMed

    Timm, Donna F; Jones, Dee; Woodson, Deidra; Cyrus, John W

    2012-01-01

    Library faculty members at the Health Sciences Library at the LSU Health Shreveport campus offer a database searching class for third-year medical students during their surgery rotation. For a number of years, students completed "ten-minute clinical challenges," but the instructors decided to replace the clinical challenges with innovative exercises using The Edwin Smith Surgical Papyrus to emphasize concepts learned. The Surgical Papyrus is an online resource that is part of the National Library of Medicine's "Turning the Pages" digital initiative. In addition, vintage surgical instruments and historic books are displayed in the classroom to enhance the learning experience.

  17. Online Searching of Bibliographic Databases: Microcomputer Access to National Information Systems.

    ERIC Educational Resources Information Center

    Coons, Bill

    This paper describes the range and scope of various information databases available for technicians, researchers, and managers employed in forestry and the forest products industry. Availability of information on reports of field and laboratory research, business trends, product prices, and company profiles through national distributors of…

  18. Online/CD-ROM Bibliographic Database Searching in a Small Academic Library.

    ERIC Educational Resources Information Center

    Pitet, Lynn T.

    The purpose of the project described in this paper was to gather information about online/CD-ROM database systems that would be useful in improving the services offered at the University of Findlay, a small private liberal arts college in northwestern Ohio. A survey was sent to 67 libraries serving colleges similar in size which included questions…

  19. Online Searching of Bibliographic Databases: Microcomputer Access to National Information Systems.

    ERIC Educational Resources Information Center

    Coons, Bill

    This paper describes the range and scope of various information databases available for technicians, researchers, and managers employed in forestry and the forest products industry. Availability of information on reports of field and laboratory research, business trends, product prices, and company profiles through national distributors of…

  20. Parallel computer architecture. (Latest citations from INSPEC - the database for Physics, Electronics, and Computing). Published Search

    SciTech Connect

    Not Available

    1993-10-01

    The bibliography contains citations concerning the development and performance analysis of parallel architecture in image processing and computing. Cost and performance evaluations of multiple processor systems are described. Applications are described, including supercomputer design, database management, computer communication systems, and robot control. (Contains 250 citations and includes a subject term index and title list.)

  1. Fast 3D molecular superposition and similarity search in databases of flexible molecules

    NASA Astrophysics Data System (ADS)

    Krämer, Andreas; Horn, Hans W.; Rice, Julia E.

    2003-01-01

    We present a new method (fFLASH) for the virtual screening of compound databases that is based on explicit three-dimensional molecular superpositions. fFLASH takes the torsional flexibility of the database molecules fully into account, and can deal with an arbitrary number of conformation-dependent molecular features. The method utilizes a fragmentation-reassembly approach which allows for an efficient sampling of the conformational space. A fast clique-based pattern matching algorithm generates alignments of pairs of adjacent molecular fragments on the rigid query molecule that are subsequently reassembled to complete database molecules. Using conventional molecular features (hydrogen bond donors and acceptors, charges, and hydrophobic groups) we show that fFLASH is able to rapidly produce accurate alignments of medium-sized drug-like molecules. Experiments with a test database containing a diverse set of 1780 drug-like molecules (including all conformers) have shown that average query processing times of the order of 0.1 seconds per molecule can be achieved on a PC.

  2. The "Clipping Thesis": An Exercise in Developing Critical Thinking and Online Database Searching Skills.

    ERIC Educational Resources Information Center

    Minnich, Nancy P.; McCarthy, Carrol B.

    1986-01-01

    Designed to help high school students develop critical thinking and writing skills, the "Clipping Thesis" project requires students to find newspaper and journal articles on a given topic through printed indexes or online searching, read the articles, write brief and final summaries of their readings, and compile a bibliography. (EM)

  3. SCOOP: A Measurement and Database of Student Online Search Behavior and Performance

    ERIC Educational Resources Information Center

    Zhou, Mingming

    2015-01-01

    The ability to access and process massive amounts of online information is required in many learning situations. In order to develop a better understanding of student online search process especially in academic contexts, an online tool (SCOOP) is developed for tracking mouse behavior on the web to build a more extensive account of student web…

  4. Information Retrieval Strategies of Millennial Undergraduate Students in Web and Library Database Searches

    ERIC Educational Resources Information Center

    Porter, Brandi

    2009-01-01

    Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search…

  5. A Search for Nontriggered Gamma-Ray Bursts in the BATSE Database

    NASA Technical Reports Server (NTRS)

    Kommers, Jefferson M.; Lewin, Walter H. G.; Kouveliotou, Chryssa; VanParadus, Jan; Pendleton, Geoffrey N.; Meegan, Charles A.; Fishman, Gerald J.

    1997-01-01

    We describe a search of archival data from the Burst and Transient Source Experiment (BATSE). The purpose of the search is to find astronomically interesting transients that did not activate the burst-detection (or "trigger") system on board the spacecraft. Our search is sensitive to events with peak fluxes (on the 1.024 s timescale) that are lower by a factor of approximately 2 than can be detected with the on-board burst trigger. In a search of 345 days of archival data, we detected 91 events in the 50-300 keV range that resemble classical gamma-ray bursts but that did not activate the on-board burst trigger. We also detected 110 low-energy (25-50 keV) events of unknown origin that may include activity from' soft gamma repeater (SGR) 1806-20 and bursts and flares from X-ray binaries. This paper gives the occurrence times, estimated source directions, durations, peak fluxes, and fluences for the 91 gamma-ray burst candidates. The direction and intensity distributions of these bursts imply that the biases inherent in the on-board trigger mechanism have not significantly affected the completeness of the published BATSE gamma-ray burst catalogs.

  6. Indexing and Online Searching of Multi-Purpose Textual Databases: Conflict or Confluence?

    ERIC Educational Resources Information Center

    Regazzi, John J.

    The indexing and online searching of data bases have both conflicting and complementary structures, determined in part by the choice of indexing language. In the planning of a machine readable system, the Foundation Center, administrator of two data bases, initially chose a natural language for the lower cost, more rapid implementation, and…

  7. SCOOP: A Measurement and Database of Student Online Search Behavior and Performance

    ERIC Educational Resources Information Center

    Zhou, Mingming

    2015-01-01

    The ability to access and process massive amounts of online information is required in many learning situations. In order to develop a better understanding of student online search process especially in academic contexts, an online tool (SCOOP) is developed for tracking mouse behavior on the web to build a more extensive account of student web…

  8. Information Retrieval Strategies of Millennial Undergraduate Students in Web and Library Database Searches

    ERIC Educational Resources Information Center

    Porter, Brandi

    2009-01-01

    Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search…

  9. The Magnetics Information Consortium (MagIC) Online Database: Uploading, Searching and Visualizing Paleomagnetic and Rock Magnetic Data

    NASA Astrophysics Data System (ADS)

    Minnett, R.; Koppers, A.; Tauxe, L.; Constable, C.; Pisarevsky, S. A.; Jackson, M.; Solheid, P.; Banerjee, S.; Johnson, C.

    2006-12-01

    The Magnetics Information Consortium (MagIC) is commissioned to implement and maintain an online portal to a relational database populated by both rock and paleomagnetic data. The goal of MagIC is to archive all measurements and the derived properties for studies of paleomagnetic directions (inclination, declination) and intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and has two search nodes, one for paleomagnetism and one for rock magnetism. Both nodes provide query building based on location, reference, methods applied, material type and geological age, as well as a visual map interface to browse and select locations. The query result set is displayed in a digestible tabular format allowing the user to descend through hierarchical levels such as from locations to sites, samples, specimens, and measurements. At each stage, the result set can be saved and, if supported by the data, can be visualized by plotting global location maps, equal area plots, or typical Zijderveld, hysteresis, and various magnetization and remanence diagrams. User contributions to the MagIC database are critical to achieving a useful research tool. We have developed a standard data and metadata template (Version 2.1) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate population of these templates within Microsoft Excel. These tools allow for the import/export of text files and provide advanced functionality to manage and edit the data, and to perform various internal checks to maintain data integrity and prepare for uploading. The MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm executes the upload and takes only a few minutes to process several thousand data records. The standardized MagIC template files are stored in the digital archives of EarthRef.org where they

  10. Searching the UVSP database and a list of experiments showing mass motions

    NASA Technical Reports Server (NTRS)

    Thompson, William

    1986-01-01

    Since the Solar Maximum Mission (SMM) satellite was launched, a large database has been built up of experiments using the Ultraviolet Spectrometer and Polarimeter (UVSP) instrument. Access to this database can be gained through the SMM Vax 750 computer at Goddard Space Flight Center. One useful way to do this is with a program called USEARCH. This program allows one to make a listing of different types of UVSP experiments. It is evident that this program is useful to those who would wish to make use of UVSP data, but who don't know what data is available. Therefore it was decided to include a short description of how to make use of the USEARCH program. Also described, but not included, is a listing of all UVSP experiments showing mass motions in prominences and filaments. This list was made with the aid of the USEARCH program.

  11. RINGS: a new search/match database for polycrystalline electron diffraction

    NASA Astrophysics Data System (ADS)

    Denley, David; Hart, Haskell

    2003-03-01

    RINGS is a relational database built from NIST Crystal Data for the identification of polycrystalline solids by selected area electron diffraction (SAED) and elemental analysis using Microsoft® Access 97(subsequently converted to Access 2000). Experimental d-spacings are matched against values calculated from reduced unit cells, thereby fully and rigorously incorporating the effects of double diffraction. A total of 79,136 inorganic phases are included with original Crystal Data reference codes, allowing access to all information in NIST Crystal Data. Specific examples illustrate the advantages over previous approaches to the problem. This database will be most useful to researchers in mineralogy, metallurgy, materials science, forensics, and analytical chemisty who seek to identify well-characterized phases with known unit cells.

  12. Relemed: sentence-level search engine with relevance score for the MEDLINE database of biomedical articles

    PubMed Central

    Siadaty, Mir S; Shu, Jianfen; Knaus, William A

    2007-01-01

    Background Receiving extraneous articles in response to a query submitted to MEDLINE/PubMed is common. When submitting a multi-word query (which is the majority of queries submitted), the presence of all query words within each article may be a necessary condition for retrieving relevant articles, but not sufficient. Ideally a relationship between the query words in the article is also required. We propose that if two words occur within an article, the probability that a relation between them is explained is higher when the words occur within adjacent sentences versus remote sentences. Therefore, sentence-level concurrence can be used as a surrogate for existence of the relationship between the words. In order to avoid the irrelevant articles, one solution would be to increase the search specificity. Another solution is to estimate a relevance score to sort the retrieved articles. However among the >30 retrieval services available for MEDLINE, only a few estimate a relevance score, and none detects and incorporates the relation between the query words as part of the relevance score. Results We have developed "Relemed", a search engine for MEDLINE. Relemed increases specificity and precision of retrieval by searching for query words within sentences rather than the whole article. It uses sentence-level concurrence as a statistical surrogate for the existence of relationship between the words. It also estimates a relevance score and sorts the results on this basis, thus shifting irrelevant articles lower down the list. In two case studies, we demonstrate that the most relevant articles appear at the top of the Relemed results, while this is not necessarily the case with a PubMed search. We have also shown that a Relemed search includes not only all the articles retrieved by PubMed, but potentially additional relevant articles, due to the extended 'automatic term mapping' and text-word searching features implemented in Relemed. Conclusion By using sentence

  13. eBASIS (Bioactive Substances in Food Information Systems) and Bioactive Intakes: Major Updates of the Bioactive Compound Composition and Beneficial Bioeffects Database and the Development of a Probabilistic Model to Assess Intakes in Europe

    PubMed Central

    Plumb, Jenny; Pigat, Sandrine; Bompola, Foteini; Cushen, Maeve; Pinchen, Hannah; Nørby, Eric; Astley, Siân; Lyons, Jacqueline; Kiely, Mairead; Finglas, Paul

    2017-01-01

    eBASIS (Bioactive Substances in Food Information Systems), a web-based database that contains compositional and biological effects data for bioactive compounds of plant origin, has been updated with new data on fruits and vegetables, wheat and, due to some evidence of potential beneficial effects, extended to include meat bioactives. eBASIS remains one of only a handful of comprehensive and searchable databases, with up-to-date coherent and validated scientific information on the composition of food bioactives and their putative health benefits. The database has a user-friendly, efficient, and flexible interface facilitating use by both the scientific community and food industry. Overall, eBASIS contains data for 267 foods, covering the composition of 794 bioactive compounds, from 1147 quality-evaluated peer-reviewed publications, together with information from 567 publications describing beneficial bioeffect studies carried out in humans. This paper highlights recent updates and expansion of eBASIS and the newly-developed link to a probabilistic intake model, allowing exposure assessment of dietary bioactive compounds to be estimated and modelled in human populations when used in conjunction with national food consumption data. This new tool could assist small- and medium-sized enterprises (SMEs) in the development of food product health claim dossiers for submission to the European Food Safety Authority (EFSA). PMID:28333085

  14. eBASIS (Bioactive Substances in Food Information Systems) and Bioactive Intakes: Major Updates of the Bioactive Compound Composition and Beneficial Bioeffects Database and the Development of a Probabilistic Model to Assess Intakes in Europe.

    PubMed

    Plumb, Jenny; Pigat, Sandrine; Bompola, Foteini; Cushen, Maeve; Pinchen, Hannah; Nørby, Eric; Astley, Siân; Lyons, Jacqueline; Kiely, Mairead; Finglas, Paul

    2017-03-23

    eBASIS (Bioactive Substances in Food Information Systems), a web-based database that contains compositional and biological effects data for bioactive compounds of plant origin, has been updated with new data on fruits and vegetables, wheat and, due to some evidence of potential beneficial effects, extended to include meat bioactives. eBASIS remains one of only a handful of comprehensive and searchable databases, with up-to-date coherent and validated scientific information on the composition of food bioactives and their putative health benefits. The database has a user-friendly, efficient, and flexible interface facilitating use by both the scientific community and food industry. Overall, eBASIS contains data for 267 foods, covering the composition of 794 bioactive compounds, from 1147 quality-evaluated peer-reviewed publications, together with information from 567 publications describing beneficial bioeffect studies carried out in humans. This paper highlights recent updates and expansion of eBASIS and the newly-developed link to a probabilistic intake model, allowing exposure assessment of dietary bioactive compounds to be estimated and modelled in human populations when used in conjunction with national food consumption data. This new tool could assist small- and medium-sized enterprises (SMEs) in the development of food product health claim dossiers for submission to the European Food Safety Authority (EFSA).

  15. Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra

    NASA Astrophysics Data System (ADS)

    Wang, Jianqi; Zhang, Yajie; Yu, Yonghao

    2015-07-01

    A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.

  16. Image Content Engine (ICE): A System for Fast Image Database Searches

    SciTech Connect

    Brase, J M; Paglieroni, D W; Weinert, G F; Grant, C W; Lopez, A S; Nikolaev, S

    2005-03-22

    The Image Content Engine (ICE) is being developed to provide cueing assistance to human image analysts faced with increasingly large and intractable amounts of image data. The ICE architecture includes user configurable feature extraction pipelines which produce intermediate feature vector and match surface files which can then be accessed by interactive relational queries. Application of the feature extraction algorithms to large collections of images may be extremely time consuming and is launched as a batch job on a Linux cluster. The query interface accesses only the intermediate files and returns candidate hits nearly instantaneously. Queries may be posed for individual objects or collections. The query interface prompts the user for feedback, and applies relevance feedback algorithms to revise the feature vector weighting and focus on relevant search results. Examples of feature extraction and both model-based and search-by-example queries are presented.

  17. Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra.

    PubMed

    Wang, Jianqi; Zhang, Yajie; Yu, Yonghao

    2015-07-01

    A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.

  18. Exchange, interpretation, and database-search of ion mobility spectra supported by data format JCAMP-DX

    NASA Technical Reports Server (NTRS)

    Baumback, J. I.; Davies, A. N.; Vonirmer, A.; Lampen, P. H.

    1995-01-01

    To assist peak assignment in ion mobility spectrometry it is important to have quality reference data. The reference collection should be stored in a database system which is capable of being searched using spectral or substance information. We propose to build such a database customized for ion mobility spectra. To start off with it is important to quickly reach a critical mass of data in the collection. We wish to obtain as many spectra combined with their IMS parameters as possible. Spectra suppliers will be rewarded for their participation with access to the database. To make the data exchange between users and system administration possible, it is important to define a file format specially made for the requirements of ion mobility spectra. The format should be computer readable and flexible enough for extensive comments to be included. In this document we propose a data exchange format, and we would like you to give comments on it. For the international data exchange it is important, to have a standard data exchange format. We propose to base the definition of this format on the JCAMP-DX protocol, which was developed for the exchange of infrared spectra. This standard made by the Joint Committee on Atomic and Molecular Physical Data is of a flexible design. The aim of this paper is to adopt JCAMP-DX to the special requirements of ion mobility spectra.

  19. A Pseudo MS3 Approach for Identification of Disulfide-Bonded Proteins: Uncommon Product Ions and Database Search

    NASA Astrophysics Data System (ADS)

    Chen, Jianzhong; Shiyanov, Pavel; Schlager, John J.; Green, Kari B.

    2012-02-01

    It has previously been reported that disulfide and backbone bonds of native intact proteins can be concurrently cleaved using electrospray ionization (ESI) and collision-induced dissociation (CID) tandem mass spectrometry (MS/MS). However, the cleavages of disulfide bonds result in different cysteine modifications in product ions, making it difficult to identify the disulfide-bonded proteins via database search. To solve this identification problem, we have developed a pseudo MS3 approach by combining nozzle-skimmer dissociation (NSD) and CID on a quadrupole time-of-flight (Q-TOF) mass spectrometer using chicken lysozyme as a model. Although many of the product ions were similar to those typically seen in MS/MS spectra of enzymatically derived peptides, additional uncommon product ions were detected including ci-1 ions (the ith residue being aspartic acid, arginine, lysine and dehydroalanine) as well as those from a scrambled sequence. The formation of these uncommon types of product ions, likely caused by the lack of mobile protons, were proposed to involve bond rearrangements via a six-membered ring transition state and/or salt bridge(s). A search of 20 pseudo MS3 spectra against the Gallus gallus (chicken) database using Batch-Tag, a program originally designed for bottom up MS/MS analysis, identified chicken lysozyme as the only hit with the expectation values less than 0.02 for 12 of the spectra. The pseudo MS3 approach may help to identify disulfide-bonded proteins and determine the associated post-translational modifications (PTMs); the confidence in the identification may be improved by incorporating the fragmentation characteristics into currently available search programs.

  20. On-line biomedical databases-the best source for quick search of the scientific information in the biomedicine.

    PubMed

    Masic, Izet; Milinovic, Katarina

    2012-06-01

    Most of medical journals now has it's electronic version, available over public networks. Although there are parallel printed and electronic versions, and one other form need not to be simultaneously published. Electronic version of a journal can be published a few weeks before the printed form and must not has identical content. Electronic form of a journals may have an extension that does not contain a printed form, such as animation, 3D display, etc., or may have available fulltext, mostly in PDF or XML format, or just the contents or a summary. Access to a full text is usually not free and can be achieved only if the institution (library or host) enters into an agreement on access. Many medical journals, however, provide free access for some articles, or after a certain time (after 6 months or a year) to complete content. The search for such journals provide the network archive as High Wire Press, Free Medical Journals.com. It is necessary to allocate PubMed and PubMed Central, the first public digital archives unlimited collect journals of available medical literature, which operates in the system of the National Library of Medicine in Bethesda (USA). There are so called on- line medical journals published only in electronic form. It could be searched over on-line databases. In this paper authors shortly described about 30 data bases and short instructions how to make access and search the published papers in indexed medical journals.

  1. Recovery actions in PRA (Probabilistic Risk Assessment) for the risk methods integration and evaluation program (RMIEP): Volume 2, Application of the data-based method

    SciTech Connect

    Whitehead, D W

    1987-12-01

    In a Probabilistic Risk Assessment (PRA) for a nuclear power plant, the analyst identifies a set of potential core damage events and their estimated probabilities of occurrence. These events include both equipment failures and human errors. If operator recovery from an event within some specified time is considered, the probability of this recovery can be included in the PRA. This report provides PRA analysts with a step-by-step methodology for including recovery actions in a PRA. The recovery action is divided into two distinct phases: a Diagnosis Phase (realizing that there is a problem with a critical parameter and deciding upon the correct course of action) and an Action Phase (physically accomplishing the required action). In this methodology, time-reliability curves, which were developed from simulator data on potentially dominant accident scenarios, are used to provide estimates for the Diagnosis Phase, and other existing methodologies are used to provide estimates for the Action Phase.

  2. Searching for stereoisomerism in crystallographic databases: algorithm, analysis and chiral curiosities.

    PubMed

    Grothe, E; Meekes, H; de Gelder, R

    2017-06-01

    The automated identification of chiral centres in molecular residues is a non-trivial task. Current tools that allow the user to analyze crystallographic data entries do not identify chiral centres in some of the more complex ring structures, or lack the possibility to determine and compare the chirality of multiple structures. This article presents an approach to identify asymmetric C atoms, which is based on the atomic walk count algorithm presented by Rücker & Rücker [(1993), J. Chem. Inf. Comput. Sci. 33, 683-695]. The algorithm, which we implemented in a computer program named ChiChi, is able to compare isomeric residues based on the chiral centres that were identified. This allows for discrimination between enantiomers, diastereomers and constitutional isomers that are present in crystallographic databases. ChiChi was used to process 254 354 organic entries from the Cambridge Structural Database (CSD). A thorough analysis of stereoisomerism in the CSD is presented accompanied by a collection of chiral curiosities that illustrate the strength and versatility of this approach.

  3. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1996-10-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  4. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1995-09-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  5. Chemical and biological warfare: General studies. (Latest citations from the NTIS Bibliographic database). Published Search

    SciTech Connect

    Not Available

    1993-11-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  6. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1997-11-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  7. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). NewSearch

    SciTech Connect

    Not Available

    1994-10-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  8. Genetic Networks of Complex Disorders: from a Novel Search Engine for PubMed Article Database.

    PubMed

    Jung, Jae-Yoon; Wall, Dennis Paul

    2013-01-01

    Finding genetic risk factors of complex disorders may involve reviewing hundreds of genes or thousands of research articles iteratively, but few tools have been available to facilitate this procedure. In this work, we built a novel publication search engine that can identify target-disorder specific, genetics-oriented research articles and extract the genes with significant results. Preliminary test results showed that the output of this engine has better coverage in terms of genes or publications, than other existing applications. We consider it as an essential tool for understanding genetic networks of complex disorders.

  9. Heart research advances using database search engines, Human Protein Atlas and the Sydney Heart Bank.

    PubMed

    Li, Amy; Estigoy, Colleen; Raftery, Mark; Cameron, Darryl; Odeberg, Jacob; Pontén, Fredrik; Lal, Sean; Dos Remedios, Cristobal G

    2013-10-01

    This Methodological Review is intended as a guide for research students who may have just discovered a human "novel" cardiac protein, but it may also help hard-pressed reviewers of journal submissions on a "novel" protein reported in an animal model of human heart failure. Whether you are an expert or not, you may know little or nothing about this particular protein of interest. In this review we provide a strategic guide on how to proceed. We ask: How do you discover what has been published (even in an abstract or research report) about this protein? Everyone knows how to undertake literature searches using PubMed and Medline but these are usually encyclopaedic, often producing long lists of papers, most of which are either irrelevant or only vaguely relevant to your query. Relatively few will be aware of more advanced search engines such as Google Scholar and even fewer will know about Quertle. Next, we provide a strategy for discovering if your "novel" protein is expressed in the normal, healthy human heart, and if it is, we show you how to investigate its subcellular location. This can usually be achieved by visiting the website "Human Protein Atlas" without doing a single experiment. Finally, we provide a pathway to discovering if your protein of interest changes its expression level with heart failure/disease or with ageing.

  10. Searching the databases: a quick look at Amazon and two other online catalogues.

    PubMed

    Potts, Hilary

    2003-01-01

    The Amazon Online Catalogue was compared with the Library of Congress Catalogue and the British Library Catalogue, both also available online, by searching on both neutral (Gay, Lesbian, Homosexual) and pejorative (Perversion, Sex Crime) subject terms, and also by searches using Boolean logic in an attempt to identify Lesbian Fiction items and religion-based anti-gay material. Amazon was much more likely to be the first port of call for non-academic enquiries. Although excluding much material necessary for academic research, it carried more information about the individual books and less historical homophobic baggage in its terminology than the great national catalogues. Its back catalogue of second-hand books outnumbered those in print. Current attitudes may partially be gauged by the relative numbers of titles published under each heading--e.g., there may be an inverse relationship between concern about child sex abuse and homophobia, more noticeable in U.S. because of the activities of the religious right.

  11. A more straightforward derivation of the LR for a database search.

    PubMed

    Berger, Charles E H; Vergeer, Peter; Buckleton, John S

    2015-01-01

    Matching DNA profiles of an accused person and a crime scene trace are one of the most common forms of forensic evidence. A number of years ago the so-called 'DNA controversy' was concerned with how to quantify the value of such evidence. Given its importance, the lack of understanding of such a basic issue was quite surprising and concerning. Deriving the equation for the likelihood ratio of a DNA database match in a much more direct and simple way is the topic of this paper. As it is much easier to follow it is hoped that this derivation will contribute to the understanding. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  12. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

    PubMed

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

  13. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs

    PubMed Central

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers. PMID:26339591

  14. ANDY: A general, fault-tolerant tool for database searching oncomputer clusters

    SciTech Connect

    Smith, Andrew; Chandonia, John-Marc; Brenner, Steven E.

    2005-12-21

    Summary: ANDY (seArch coordination aND analYsis) is a set ofPerl programs and modules for distributing large biological databasesearches, and in general any sequence of commands, across the nodes of aLinux computer cluster. ANDY is compatible with several commonly usedDistributed Resource Management (DRM) systems, and it can be easilyextended to new DRMs. A distinctive feature of ANDY is the choice ofeither dedicated or fair-use operation: ANDY is almost as efficient assingle-purpose tools that require a dedicated cluster, but it runs on ageneral-purpose cluster along with any other jobs scheduled by a DRM.Other features include communication through named pipes for performance,flexible customizable routines for error-checking and summarizingresults, and multiple fault-tolerance mechanisms. Availability: ANDY isfreely available and may be obtained fromhttp://compbio.berkeley.edu/proj/andy; this site also containssupplemental data and figures and amore detailed overview of thesoftware.

  15. An ultra-tolerant database search reveals that a myriad of modified peptides contributes to unassigned spectra in shotgun proteomics

    PubMed Central

    Chick, Joel M.; Kolippakkam, Deepak; Nusinow, David P.; Zhai, Bo; Rad, Ramin; Huttlin, Edward L.; Gygi, Steven P.

    2015-01-01

    Fewer than half of all tandem mass spectrometry (MS/MS) spectra acquired in shotgun proteomics experiments are typically matched to a peptide with high confidence. Here we determine the identity of unassigned peptides using an ultra-tolerant Sequest database search that allows peptide matching even with modifications of unknown masses up to ±500 Da. In a proteome-wide dataset on HEK293 cells (9,513 proteins and 396,736 peptides), this approach matched an additional 184,000 modified peptides, which were linked to biological and chemical modifications representing 523 distinct mass bins, including phosphorylation, glycosylation, and methylation. We localized all unknown modification masses to specific regions within a peptide. Known modifications were assigned to the correct amino acids with frequencies often >90%. We conclude that at least one third of unassigned spectra arise from peptides with substoichiometric modifications. PMID:26076430

  16. Pivotal Role of Computers and Software in Mass Spectrometry - SEQUEST and 20 Years of Tandem MS Database Searching

    NASA Astrophysics Data System (ADS)

    Yates, John R.

    2015-11-01

    Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures.

  17. Pivotal role of computers and software in mass spectrometry - SEQUEST and 20 years of tandem MS database searching.

    PubMed

    Yates, John R

    2015-11-01

    Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures. Graphical Abstract ᅟ.

  18. The wildcat toolbox: a set of perl script utilities for use in peptide mass spectral database searching and proteomics experiments.

    PubMed

    Haynes, Paul A; Miller, Susan; Radabaugh, Tim; Galligan, Michael; Breci, Linda; Rohrbough, James; Hickman, Fatimah; Merchant, Nirav

    2006-04-01

    We describe in this communication a set of functional perl script utilities for use in peptide mass spectral database searching and proteomics experiments, known as the Wildcat Toolbox. These are all freely available for download from our laboratory Web site (http://proteomics.arizona.edu/toolbox.html) as a combined zip file, and can also be accessed via the Proteome Commons Web site (www.proteomecommons.org) in the tools section. We make them available to other potential users in the spirit of open source software development; we do not have the resources to provide any significant technical support for them, but we hope users will share both bugs and improvements with the community at large.

  19. The Wildcat Toolbox: A Set of Perl Script Utilities for Use in Peptide Mass Spectral Database Searching and Proteomics Experiments

    PubMed Central

    Haynes, Paul A.; Miller, Susan; Radabaugh, Tim; Galligan, Michael; Breci, Linda; Rohrbough, James; Hickman, Fatimah; Merchant, Nirav

    2006-01-01

    We describe in this communication a set of functional perl script utilities for use in peptide mass spectral database searching and proteomics experiments, known as the Wildcat Toolbox. These are all freely available for download from our laboratory Web site (http://proteomics.arizona.edu/toolbox.html) as a combined zip file, and can also be accessed via the Proteome Commons Web site (www.proteomecommons.org) in the tools section. We make them available to other potential users in the spirit of open source software development; we do not have the resources to provide any significant technical support for them, but we hope users will share both bugs and improvements with the community at large. PMID:16741236

  20. FTP-Server for exchange, interpretation, and database-search of ion mobility spectra, literature, preprints and software

    NASA Technical Reports Server (NTRS)

    Baumbach, J. I.; Vonirmer, A.

    1995-01-01

    To assist current discussion in the field of ion mobility spectrometry, at the Institut fur Spectrochemie und angewandte Spektroskopie, Dortmund, start with 4th of December, 1994 work of an FTP-Server, available for all research groups at univerisities, institutes and research worker in industry. We support the exchange, interpretation, and database-search of ion mobility spectra through data format JCAMP-DS (Joint Committee on Atomic and Molecular Physical Data) as well as literature retrieval, pre-print, notice, and discussion board. We describe in general lines the entrance conditions, local addresses, and main code words. For further details, a monthly news report will be prepared for all common users. Internet email address for subscribing is included in document.

  1. Does PSADT after Radical Prostatectomy Correlate with Overall Survival? — A Report from the SEARCH Database Group

    PubMed Central

    Teeter, Anna E.; Presti, Joseph C.; Aronson, William J.; Terris, Martha K.; Kane, Christopher J.; Amling, Christopher L.; Freedland, Stephen J.

    2010-01-01

    Objective Prior studies largely performed at tertiary care centers with relatively young, racially homogenous cohorts found a short PSA doubling time (PSADT) following recurrence after radical prostatectomy (RP) portends a poor prognosis. We examined the correlation between PSADT and overall survival (OS) and among men in the SEARCH database, an older, racially diverse cohort treated with RP at multiple Veterans Affairs medical centers. Methods We performed a Cox proportional hazards analysis to examine the correlation between post-recurrence PSADT and time from recurrence to OS and PCSM among 345 men in the SEARCH database who underwent RP between 1988 and 2008. We examined PSADT as a categorical variable based on the clinically significant cut-points of <3, 3-8.9, 9–14.9, and ≥15 months. Results PSADT <3 months (HR 5.48, p=0.002) was associated with poorer OS versus PSADT ≥15 months. There was a trend towards worse OS among men with a PSADT of 3–8.9 months (HR=1.70, p=0.07). PSADT <3 months (p<0.001) and 3–8.9 months (p=0.004) were associated with increased risk of PCSM. Conclusions In an older, racially diverse cohort, recurrence with a PSADT <9 months was associated with worse all-cause mortality. This study validates prior findings that PSADT is a useful tool for identifying men who are at increased risk of all-cause mortality early in the course of their disease. PMID:21145094

  2. Recovery actions in PRA (probabilistic risk assessment) for the Risk Methods Integration and Evaluation Program (RMIEP): Volume 1, Development of the data-based method

    SciTech Connect

    Weston, L M; Whitehead, D W; Graves, N L

    1987-06-01

    In a probabilistic risk assessment (PRA) for a nuclear power plant, the analyst identifies a set of potential core damage events consisting of equipment failures and human errors and their estimated probabilities of occurrence. If operator recovery from an event within some specified time is considered, then the probability of this recovery can be included in the PRA. This report provides PRA analysts with an improved methodology for including recovery actions in a PRA. A recovery action can be divided into two distinct phases: a Diagnosis Phase (realizing that there is a problem with a critical parameter and deciding upon the correct course of action) and an Action Phase (physically accomplishing the required action). In this methodology, simulator data are used to estimate recovery probabilities for the diagnosis phase. Different time-reliability curves showing the probability of failure of diagnosis as a function of time from the compelling cue for the event are presented. These curves are based on simulator exercises, and the actions are grouped based upon their operational similarities. This is an improvement over existing diagnosis models that rely greatly upon subjective judgment to obtain such estimates. The action phase is modeled using estimates from available sources. The methodology also includes a recommendation on where and when to apply the recovery action in the PRA process.

  3. The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching

    PubMed Central

    Howe, Douglas G.; Bradford, Yvonne M.; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte

    2017-01-01

    The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, ‘Fish’ records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search. PMID:27899582

  4. The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching.

    PubMed

    Howe, Douglas G; Bradford, Yvonne M; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte

    2017-01-04

    The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, 'Fish' records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search.

  5. Quality Control of Biomedicinal Allergen Products - Highly Complex Isoallergen Composition Challenges Standard MS Database Search and Requires Manual Data Analyses.

    PubMed

    Spiric, Jelena; Engin, Anna M; Karas, Michael; Reuter, Andreas

    2015-01-01

    Allergy against birch pollen is among the most common causes of spring pollinosis in Europe and is diagnosed and treated using extracts from natural sources. Quality control is crucial for safe and effective diagnosis and treatment. However, current methods are very difficult to standardize and do not address individual allergen or isoallergen composition. MS provides information regarding selected proteins or the entire proteome and could overcome the aforementioned limitations. We studied the proteome of birch pollen, focusing on allergens and isoallergens, to clarify which of the 93 published sequence variants of the major allergen, Bet v 1, are expressed as proteins within one source material in parallel. The unexpectedly complex Bet v 1 isoallergen composition required manual data interpretation and a specific design of databases, as current database search engines fail to unambiguously assign spectra to highly homologous, partially identical proteins. We identified 47 non-allergenic proteins and all 5 known birch pollen allergens, and unambiguously proved the existence of 18 Bet v 1 isoallergens and variants by manual data analysis. This highly complex isoallergen composition raises questions whether isoallergens can be ignored or must be included for the quality control of allergen products, and which data analysis strategies are to be applied.

  6. Utility of rapid database searching for quality assurance: 'detective work' in uncovering radiology coding and billing errors

    NASA Astrophysics Data System (ADS)

    Horii, Steven C.; Kim, Woojin; Boonn, William; Iyoob, Christopher; Maston, Keith; Coleman, Beverly G.

    2011-03-01

    When the first quarter of 2010 Department of Radiology statistics were provided to the Section Chiefs, the authors (SH, BC) were alarmed to discover that Ultrasound showed a decrease of 2.5 percent in billed examinations. This seemed to be in direct contradistinction to the experience of the ultrasound faculty members and sonographers. Their experience was that they were far busier than during the same quarter of 2009. The one exception that all acknowledged was the month of February, 2010 when several major winter storms resulted in a much decreased Hospital admission and Emergency Department visit rate. Since these statistics in part help establish priorities for capital budget items, professional and technical staffing levels, and levels of incentive salary, they are taken very seriously. The availability of a desktop, Web-based RIS database search tool developed by two of the authors (WK, WB) and built-in database functions of the ultrasound miniPACS, made it possible for us very rapidly to develop and test hypotheses for why the number of billable examinations was declining in the face of what experience told the authors was an increasing number of examinations being performed. Within a short time, we identified the major cause as errors on the part of the company retained to verify billable Current Procedural Terminology (CPT) codes against ultrasound reports. This information is being used going forward to recover unbilled examinations and take measures to reduce or eliminate the types of coding errors that resulted in the problem.

  7. Similarity landscapes: An improved method for scientific visualization of information from protein and DNA database searches

    SciTech Connect

    Dogget, N.; Myers, G.; Wills, C.J.

    1998-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The authors have used computer simulations and examination of a variety of databases to answer questions about a wide range of evolutionary questions. The authors have found that there is a clear distinction in the evolution of HIV-1 and HIV-2, with the former and more virulent virus evolving more rapidly at a functional level. The authors have discovered highly non-random patterns in the evolution of HIV-1 that can be attributed to a variety of selective pressures. In the course of examination of microsatellite DNA (short repeat regions) in microorganisms, the authors have found clear differences between prokaryotes and eukaryotes in their distribution, differences that can be tied to different selective pressures. They have developed a new method (topiary pruning) for enhancing the phylogenetic information contained in DNA sequences. Most recently, the authors have discovered effects in complex rainforest ecosystems that indicate strong frequency-dependent interactions between host species and their parasites, leading to the maintenance of ecosystem variability.

  8. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    PubMed

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

  9. A Cautionary Tale on the Inclusion of Variable Posttranslational Modifications in Database-Dependent Searches of Mass Spectrometry Data.

    PubMed

    Svozil, J; Baerenfaller, K

    2017-01-01

    Mass spectrometry-based proteomics allows in principle the identification of unknown target proteins of posttranslational modifications and the sites of attachment. Including a variety of posttranslational modifications in database-dependent searches of high-throughput mass spectrometry data holds the promise to gain spectrum assignments to modified peptides, thereby increasing the number of assigned spectra, and to identify potentially interesting modification events. However, these potential benefits come for the price of an increased search space, which can lead to reduced scores, increased score thresholds, and erroneous peptide spectrum matches. We have assessed here the advantages and disadvantages of including the variable posttranslational modifications methionine oxidation, protein N-terminal acetylation, cysteine carbamidomethylation, transformation of N-terminal glutamine to pyroglutamic acid (Gln→pyro-Glu), and deamidation of asparagine and glutamine. Based on calculations of local false discovery rates and comparisons to known features of the respective modifications, we recommend for searches of samples that were not enriched for specific posttranslational modifications to only include methionine oxidation, protein N-terminal acetylation, and peptide N-terminal Gln→pyro-Glu as variable modifications. The principle of the validation strategy adopted here can also be applied for assessing the inclusion of posttranslational modifications for differently prepared samples, or for additional modifications. In addition, we have reassessed the special properties of the ubiquitin footprint, which is the remainder of ubiquitin moieties attached to lysines after tryptic digest. We show here that the ubiquitin footprint often breaks off as neutral loss and that it can be distinguished from dicarbamidomethylation events. © 2017 Elsevier Inc. All rights reserved.

  10. Uploading, Searching and Visualizing of Paleomagnetic and Rock Magnetic Data in the Online MagIC Database

    NASA Astrophysics Data System (ADS)

    Minnett, R.; Koppers, A.; Tauxe, L.; Constable, C.; Donadini, F.

    2007-12-01

    The Magnetics Information Consortium (MagIC) is commissioned to implement and maintain an online portal to a relational database populated by both rock and paleomagnetic data. The goal of MagIC is to archive all available measurements and derived properties from paleomagnetic studies of directions and intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and will soon implement two search nodes, one for paleomagnetism and one for rock magnetism. Currently the PMAG node is operational. Both nodes provide query building based on location, reference, methods applied, material type and geological age, as well as a visual map interface to browse and select locations. Users can also browse the database by data type or by data compilation to view all contributions associated with well known earlier collections like PINT, GMPDB or PSVRL. The query result set is displayed in a digestible tabular format allowing the user to descend from locations to sites, samples, specimens and measurements. At each stage, the result set can be saved and, where appropriate, can be visualized by plotting global location maps, equal area, XY, age, and depth plots, or typical Zijderveld, hysteresis, magnetization and remanence diagrams. User contributions to the MagIC database are critical to achieving a useful research tool. We have developed a standard data and metadata template (version 2.3) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate population of these templates within Microsoft Excel. These tools allow for the import/export of text files and provide advanced functionality to manage and edit the data, and to perform various internal checks to maintain data integrity and prepare for uploading. The MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm executes the upload

  11. Asthma research performance in Asia-Pacific: a bibliometric analysis by searching PubMed database.

    PubMed

    Klaewsongkram, Jettanong; Reantragoon, Rangsima

    2009-12-01

    Countries in the Asia-Pacific region have experienced an increase in the prevalence of asthma, and they have been actively involved in asthma research recently. This study aimed to analyze asthma research from Asia-Pacific in the last decade by bibliometric method. Asthma articles from Asia-Pacific countries published between 1998 and 2007 were retrieved from PubMed by searching MeSH for "asthma." Most of published asthma articles in Asia-Pacific are from affluent countries in northeast Asia and Oceania. Australia and Japan have been the regional powerhouses since they contributed more than half of regional articles on asthma. Asthma publications from emerging economies in Asia such as South Korea, Taiwan, Hong Kong, and Singapore, have dramatically increased in the last decade in terms of quantity and quality aspects and were considerable sources of basic and translational research in the region. Mainland China and India have significantly increased their research capacity as well, but quality needs to be improved. Asthma publications from New Zealand and Australia, countries with the highest asthma prevalence rates in the world, yielded highest citation counts per articles and were published in journals with high impact factor. Asthma research parameters per million population correlate well with gross domestic product per capita. Almost half (41%) of total articles were produced from only 25 institutions in the region and almost half of them (47%) were published in 20 journals. Asthma research in Asia-Pacific were mainly conducted in countries in Oceania and Northeast Asia and research performance strongly correlated with the nation's wealth. Interesting asthma research projects in the region were recommended.

  12. Is the basic conditional probabilistic?

    PubMed

    Goodwin, Geoffrey P

    2014-06-01

    Nine experiments examined whether individuals treat the meaning of basic conditional assertions as deterministic or probabilistic. In Experiments 1-4, participants were presented with either probabilistic or deterministic relations, which they had to describe with a conditional. These experiments consistently showed that people tend only to use the basic if p then q construction to describe deterministic relations between antecedent and consequent, whereas they use a probabilistically qualified construction, if p then probably q, to describe probabilistic relations-suggesting that the default interpretation of the conditional is deterministic. Experiments 5 and 6 showed that when directly asked, individuals typically report that conditional assertions admit no exceptions (i.e., they are seen as deterministic). Experiments 7-9 showed that individuals judge the truth of conditional assertions in accordance with this deterministic interpretation. Together, these results pose a challenge to probabilistic accounts of the meaning of conditionals and support mental models, formal rules, and suppositional accounts. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  13. Reprint of "pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data".

    PubMed

    Chi, Hao; He, Kun; Yang, Bing; Chen, Zhen; Sun, Rui-Xiang; Fan, Sheng-Bo; Zhang, Kun; Liu, Chao; Yuan, Zuo-Fei; Wang, Quan-Hui; Liu, Si-Qi; Dong, Meng-Qiu; He, Si-Min

    2015-11-03

    Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data.

    PubMed

    Chi, Hao; He, Kun; Yang, Bing; Chen, Zhen; Sun, Rui-Xiang; Fan, Sheng-Bo; Zhang, Kun; Liu, Chao; Yuan, Zuo-Fei; Wang, Quan-Hui; Liu, Si-Qi; Dong, Meng-Qiu; He, Si-Min

    2015-07-01

    Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested. Copyright © 2015 Elsevier B.V. All rights reserved.

  15. Automatic sorting of toxicological information into the IUCLID (International Uniform Chemical Information Database) endpoint-categories making use of the semantic search engine Go3R.

    PubMed

    Sauer, Ursula G; Wächter, Thomas; Hareng, Lars; Wareing, Britta; Langsch, Angelika; Zschunke, Matthias; Alvers, Michael R; Landsiedel, Robert

    2014-06-01

    The knowledge-based search engine Go3R, www.Go3R.org, has been developed to assist scientists from industry and regulatory authorities in collecting comprehensive toxicological information with a special focus on identifying available alternatives to animal testing. The semantic search paradigm of Go3R makes use of expert knowledge on 3Rs methods and regulatory toxicology, laid down in the ontology, a network of concepts, terms, and synonyms, to recognize the contents of documents. Search results are automatically sorted into a dynamic table of contents presented alongside the list of documents retrieved. This table of contents allows the user to quickly filter the set of documents by topics of interest. Documents containing hazard information are automatically assigned to a user interface following the endpoint-specific IUCLID5 categorization scheme required, e.g. for REACH registration dossiers. For this purpose, complex endpoint-specific search queries were compiled and integrated into the search engine (based upon a gold standard of 310 references that had been assigned manually to the different endpoint categories). Go3R sorts 87% of the references concordantly into the respective IUCLID5 categories. Currently, Go3R searches in the 22 million documents available in the PubMed and TOXNET databases. However, it can be customized to search in other databases including in-house databanks. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Exploring Site-Specific N-Glycosylation Microheterogeneity of Haptoglobin using Glycopeptide CID Tandem Mass Spectra and Glycan Database Search

    PubMed Central

    Chandler, Kevin Brown; Pompach, Petr; Goldman, Radoslav

    2013-01-01

    Glycosylation is a common protein modification with a significant role in many vital cellular processes and human diseases, making the characterization of protein-attached glycan structures important for understanding cell biology and disease processes. Direct analysis of protein N-glycosylation by tandem mass spectrometry of glycopeptides promises site-specific elucidation of N-glycan microheterogeneity, something which detached N-glycan and de-glycosylated peptide analyses cannot provide. However, successful implementation of direct N-glycopeptide analysis by tandem mass spectrometry remains a challenge. In this work, we consider algorithmic techniques for the analysis of LC-MS/MS data acquired from glycopeptide-enriched fractions of enzymatic digests of purified proteins. We implement a computational strategy which takes advantage of the properties of CID fragmentation spectra of N-glycopeptides, matching the MS/MS spectra to peptide-glycan pairs from protein sequences and glycan structure databases. Significantly, we also propose a novel false-discovery-rate estimation technique to estimate and manage the number of false identifications. We use a human glycoprotein standard, haptoglobin, digested with trypsin and GluC, enriched for glycopeptides using HILIC chromatography, and analyzed by LC-MS/MS to demonstrate our algorithmic strategy and evaluate its performance. Our software, GlycoPeptideSearch (GPS), assigned glycopeptide identifications to 246 of the spectra at false-discovery-rate 5.58%, identifying 42 distinct haptoglobin peptide-glycan pairs at each of the four haptoglobin N-linked glycosylation sites. We further demonstrate the effectiveness of this approach by analyzing plasma-derived haptoglobin, identifying 136 N-linked glycopeptide spectra at false-discovery-rate 0.4%, representing 15 distinct glycopeptides on at least three of the four N-linked glycosylation sites. The software, GlycoPeptideSearch, is available for download from http

  17. Novel lead generation through hypothetical pharmacophore three-dimensional database searching: discovery of isoflavonoids as nonsteroidal inhibitors of rat 5 alpha-reductase.

    PubMed

    Chen, G S; Chang, C S; Kan, W M; Chang, C L; Wang, K C; Chern, J W

    2001-11-08

    A hypothetical pharmacophore of 5 alpha-reductase inhibitors was generated and served as a template in virtual screening. When the pharmacophore was used, eight isoflavone derivatives were characterized as novel potential nonsteroidal inhibitors of rat 5 alpha-reductase. This investigation has demonstrated a practical approach toward the development of lead compounds through a hypothetic pharmacophore via three-dimensional database searching.

  18. PhenoMeter: A Metabolome Database Search Tool Using Statistical Similarity Matching of Metabolic Phenotypes for High-Confidence Detection of Functional Links.

    PubMed

    Carroll, Adam J; Zhang, Peng; Whitehead, Lynne; Kaines, Sarah; Tcherkez, Guillaume; Badger, Murray R

    2015-01-01

    This article describes PhenoMeter (PM), a new type of metabolomics database search that accepts metabolite response patterns as queries and searches the MetaPhen database of reference patterns for responses that are statistically significantly similar or inverse for the purposes of detecting functional links. To identify a similarity measure that would detect functional links as reliably as possible, we compared the performance of four statistics in correctly top-matching metabolic phenotypes of Arabidopsis thaliana metabolism mutants affected in different steps of the photorespiration metabolic pathway to reference phenotypes of mutants affected in the same enzymes by independent mutations. The best performing statistic, the PM score, was a function of both Pearson correlation and Fisher's Exact Test of directional overlap. This statistic outperformed Pearson correlation, biweight midcorrelation and Fisher's Exact Test used alone. To demonstrate general applicability, we show that the PM reliably retrieved the most closely functionally linked response in the database when queried with responses to a wide variety of environmental and genetic perturbations. Attempts to match metabolic phenotypes between independent studies were met with varying success and possible reasons for this are discussed. Overall, our results suggest that integration of pattern-based search tools into metabolomics databases will aid functional annotation of newly recorded metabolic phenotypes analogously to the way sequence similarity search algorithms have aided the functional annotation of genes and proteins. PM is freely available at MetabolomeExpress (https://www.metabolome-express.org/phenometer.php).

  19. Conceptual changes arising from the use of a search interface developed for an elementary science curriculum database

    NASA Astrophysics Data System (ADS)

    Dwyer, William Michael

    1998-12-01

    The purpose of this study was to look for evidence of change in preservice elementary teachers, notions of science teaching after practice using a search interface for a database of elementary science curriculum materials. The Science Helper K--8 CD-ROM uses search criteria that include science content and process theme to provide appropriate science lessons for elementary educators. Training that took place when Science Helper was first disseminated revealed the possibility that notions about teaching science change with use of the resource. This study looked for evidence of conceptual change compatible with notions in recent reform materials, such as the National Science Education Standards. The study design consisted of a pretest-treatment-posttest model. The treatment included a brief training session in the use of Science Helper, followed by practical application, which consisted of finding appropriate lessons to form a science mini-unit. An analysis of covariance (ANCOVA), however, did not find significant differences between pretest and posttest scores for the treatment group. Study participants also wrote brief narratives about their experiences using Science Helper. A pattern analysis of the narratives found that most of the preservice teachers had positive experiences, saying the resource was easy to use and contained many interesting science activities. A closer examination of the comments revealed a subset of participants who expressed an understanding of the importance of criteria searches and the relatedness of the lessons produced. An ANCOVA of the treatment group controlling for pretest did not find significant differences between pretest and posttest scores for the group who expressed such understanding. Science Helper, with its affordances as a teacher resource, can be regarded as a "knowledge system" in a distributed environment. The interactions among people and material resources in a distributed environment results in a distributed

  20. Astrobiological complexity with probabilistic cellular automata.

    PubMed

    Vukotić, Branislav; Ćirković, Milan M

    2012-08-01

    The search for extraterrestrial life and intelligence constitutes one of the major endeavors in science, but has yet been quantitatively modeled only rarely and in a cursory and superficial fashion. We argue that probabilistic cellular automata (PCA) represent the best quantitative framework for modeling the astrobiological history of the Milky Way and its Galactic Habitable Zone. The relevant astrobiological parameters are to be modeled as the elements of the input probability matrix for the PCA kernel. With the underlying simplicity of the cellular automata constructs, this approach enables a quick analysis of large and ambiguous space of the input parameters. We perform a simple clustering analysis of typical astrobiological histories with "Copernican" choice of input parameters and discuss the relevant boundary conditions of practical importance for planning and guiding empirical astrobiological and SETI projects. In addition to showing how the present framework is adaptable to more complex situations and updated observational databases from current and near-future space missions, we demonstrate how numerical results could offer a cautious rationale for continuation of practical SETI searches.

  1. Astrobiological Complexity with Probabilistic Cellular Automata

    NASA Astrophysics Data System (ADS)

    Vukotić, Branislav; Ćirković, Milan M.

    2012-08-01

    The search for extraterrestrial life and intelligence constitutes one of the major endeavors in science, but has yet been quantitatively modeled only rarely and in a cursory and superficial fashion. We argue that probabilistic cellular automata (PCA) represent the best quantitative framework for modeling the astrobiological history of the Milky Way and its Galactic Habitable Zone. The relevant astrobiological parameters are to be modeled as the elements of the input probability matrix for the PCA kernel. With the underlying simplicity of the cellular automata constructs, this approach enables a quick analysis of large and ambiguous space of the input parameters. We perform a simple clustering analysis of typical astrobiological histories with "Copernican" choice of input parameters and discuss the relevant boundary conditions of practical importance for planning and guiding empirical astrobiological and SETI projects. In addition to showing how the present framework is adaptable to more complex situations and updated observational databases from current and near-future space missions, we demonstrate how numerical results could offer a cautious rationale for continuation of practical SETI searches.

  2. Evidential significance of automotive paint trace evidence using a pattern recognition based infrared library search engine for the Paint Data Query Forensic Database.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Fasasi, Ayuba; Weakley, Andrew

    2016-10-01

    A prototype library search engine has been further developed to search the infrared spectral libraries of the paint data query database to identify the line and model of a vehicle from the clear coat, surfacer-primer, and e-coat layers of an intact paint chip. For this study, search prefilters were developed from 1181 automotive paint systems spanning 3 manufacturers: General Motors, Chrysler, and Ford. The best match between each unknown and the spectra in the hit list generated by the search prefilters was identified using a cross-correlation library search algorithm that performed both a forward and backward search. In the forward search, spectra were divided into intervals and further subdivided into windows (which corresponds to the time lag for the comparison) within those intervals. The top five hits identified in each search window were compiled; a histogram was computed that summarized the frequency of occurrence for each library sample, with the IR spectra most similar to the unknown flagged. The backward search computed the frequency and occurrence of each line and model without regard to the identity of the individual spectra. Only those lines and models with a frequency of occurrence greater than or equal to 20% were included in the final hit list. If there was agreement between the forward and backward search results, the specific line and model common to both hit lists was always the correct assignment. Samples assigned to the same line and model by both searches are always well represented in the library and correlate well on an individual basis to specific library samples. For these samples, one can have confidence in the accuracy of the match. This was not the case for the results obtained using commercial library search algorithms, as the hit quality index scores for the top twenty hits were always greater than 99%. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. On the applicability of probabilistics

    SciTech Connect

    Roth, P.G.

    1996-12-31

    GEAE`s traditional lifing approach, based on Low Cycle Fatigue (LCF) curves, is evolving for fracture critical powder metal components by incorporating probabilistic fracture mechanics analysis. Supporting this move is a growing validation database which convincingly demonstrates that probabilistics work given the right inputs. Significant efforts are being made to ensure the right inputs. For example, Heavy Liquid Separation (HLS) analysis has been developed to quantify and control inclusion content (1). Also, an intensive seeded fatigue program providing a model for crack initiation at inclusions is ongoing (2). Despite the optimism and energy, probabilistics are only tools and have limitations. Designing to low failure probabilities helps provide protection, but other strategies are needed to protect against surprises. A low risk design limit derived from a predicted failure distribution can lead to a high risk deployment if there are unaccounted-for deviations from analysis assumptions. Recognized deviations which are statistically quantifiable can be integrated into the probabilistic analysis (an advantage of the approach). When deviations are known to be possible but are not properly describable statistically, it may be more appropriate to maintain the traditional position of conservatively bounding relevant input parameters. Finally, safety factors on analysis results may be called for in cases where there is little experience supporting new design concepts or material applications (where unrecognized deviations might be expected).

  4. RFRCDB-siRNA: improved design of siRNAs by random forest regression model coupled with database searching.

    PubMed

    Jiang, Peng; Wu, Haonan; Da, Yao; Sang, Fei; Wei, Jiawei; Sun, Xiao; Lu, Zuhong

    2007-09-01

    Although the observations concerning the factors which influence the siRNA efficacy give clues to the mechanism of RNAi, the quantitative prediction of the siRNA efficacy is still a challenge task. In this paper, we introduced a novel non-linear regression method: random forest regression (RFR), to quantitatively estimate siRNAs efficacy values. Compared with an alternative machine learning regression algorithm, support vector machine regression (SVR) and four other score-based algorithms [A. Reynolds, D. Leake, Q. Boese, S. Scaringe, W.S. Marshall, A. Khvorova, Rational siRNA design for RNA interference, Nat. Biotechnol. 22 (2004) 326-330; K. Ui-Tei, Y. Naito, F. Takahashi, T. Haraguchi, H. Ohki-Hamazaki, A. Juni, R. Ueda, K. Saigo, Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference, Nucleic Acids Res. 32 (2004) 936-948; A.C. Hsieh, R. Bo, J. Manola, F. Vazquez, O. Bare, A. Khvorova, S. Scaringe, W.R. Sellers, A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants of gene silencing for use in cell-based screens, Nucleic Acids Res. 32 (2004) 893-901; M. Amarzguioui, H. Prydz, An algorithm for selection of functional siRNA sequences, Biochem. Biophys. Res. Commun. 316 (2004) 1050-1058) our RFR model achieved the best performance of all. A web-server, RFRCDB-siRNA (http://www.bioinf.seu.edu.cn/siRNA/index.htm), has been developed. RFRCDB-siRNA consists of two modules: a siRNA-centric database and a RFR prediction system. RFRCDB-siRNA works as follows: (1) Instead of directly predicting the gene silencing activity of siRNAs, the service takes these siRNAs as queries to search against the siRNA-centric database. The matched sequences with the exceeding the user defined functionality value threshold are kept. (2) The mismatched sequences are then processed into the RFR prediction system for further analysis.

  5. Pattern Recognition-Assisted Infrared Library Searching of the Paint Data Query Database to Enhance Lead Information from Automotive Paint Trace Evidence.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Weakley, Andrew

    2017-03-01

    Multilayered automotive paint fragments, which are one of the most complex materials encountered in the forensic science laboratory, provide crucial links in criminal investigations and prosecutions. To determine the origin of these paint fragments, forensic automotive paint examiners have turned to the paint data query (PDQ) database, which allows the forensic examiner to compare the layer sequence and color, texture, and composition of the sample to paint systems of the original equipment manufacturer (OEM). However, modern automotive paints have a thin color coat and this layer on a microscopic fragment is often too thin to obtain accurate chemical and topcoat color information. A search engine has been developed for the infrared (IR) spectral libraries of the PDQ database in an effort to improve discrimination capability and permit quantification of discrimination power for OEM automotive paint comparisons. The similarity of IR spectra of the corresponding layers of various records for original finishes in the PDQ database often results in poor discrimination using commercial library search algorithms. A pattern recognition approach employing pre-filters and a cross-correlation library search algorithm that performs both a forward and backward search has been used to significantly improve the discrimination of IR spectra in the PDQ database and thus improve the accuracy of the search. This improvement permits inter-comparison of OEM automotive paint layer systems using the IR spectra alone. Such information can serve to quantify the discrimination power of the original automotive paint encountered in casework and further efforts to succinctly communicate trace evidence to the courts.

  6. Do all men with pathological Gleason score 8-10 prostate cancer have poor outcomes? Results from the SEARCH database.

    PubMed

    Fischer, Sean; Lin, Daniel; Simon, Ross M; Howard, Lauren E; Aronson, William J; Terris, Martha K; Kane, Christopher J; Amling, Christopher L; Cooperberg, Matt R; Freedland, Stephen J; Vidal, Adriana C

    2016-08-01

    To determine whether there are subsets of men with pathological high grade prostate cancer (Gleason score 8-10) with particularly high or low 2-year biochemical recurrence (BCR) risk after radical prostatectomy (RP) when stratified into groups based on combinations of pathological features, such as surgical margin status, extracapsular extension (ECE) and seminal vesicle invasion (SVI). We identified 459 men treated with RP with pathological Gleason score 8-10 prostate cancer in the SEARCH database. The men were stratified into five groups based on pathological characteristics: group 1, men with negative surgical margins (NSMs) and no ECE; group 2, men with positive surgical margin (PSMs) and no ECE; group 3, men with NSMs and ECE; group 4, men with PSMs and ECE; and group 5, men with SVI. Cox proportional hazards models and the log-rank test were used to compare BCR among the groups. At 2 years after RP, pathological group was significantly correlated with BCR (log-rank, P < 0.001) with patients in group 5 (+SVI) having the highest BCR risk (66%) and those in group 1 (NSMs and no ECE) having the lowest risk (14%). When we compared groups 2, 3, and 4, with each other, there was no significant difference in BCR among the groups (~50% 2-year BCR risk; log-rank P = 0.28). Results were similar when adjusting for prostate-specific antigen, age, pathological Gleason sum and clinical stage, or after excluding men who received adjuvant therapy. In patients with high grade (Gleason score 8-10) prostate cancer after RP, the presence of either PSMs, ECE or SVI was associated with an increased risk of early BCR, with a 2-year BCR risk of ≥50%. Conversely, men with organ-confined margin-negative disease had a very low risk of early BCR despite Gleason score 8-10 disease. © 2015 The Authors BJU International © 2015 BJU International Published by John Wiley & Sons Ltd.

  7. Use of DNA profiles for investigation using a simulated national DNA database: Part II. Statistical and ethical considerations on familial searching.

    PubMed

    Hicks, T; Taroni, F; Curran, J; Buckleton, J; Castella, V; Ribaux, O

    2010-10-01

    Familial searching consists of searching for a full profile left at a crime scene in a National DNA Database (NDNAD). In this paper we are interested in the circumstance where no full match is returned, but a partial match is found between a database member's profile and the crime stain. Because close relatives share more of their DNA than unrelated persons, this partial match may indicate that the crime stain was left by a close relative of the person with whom the partial match was found. This approach has successfully solved important crimes in the UK and the USA. In a previous paper, a model, which takes into account substructure and siblings, was used to simulate a NDNAD. In this paper, we have used this model to test the usefulness of familial searching and offer guidelines for pre-assessment of the cases based on the likelihood ratio. Siblings of "persons" present in the simulated Swiss NDNAD were created. These profiles (N=10,000) were used as traces and were then compared to the whole database (N=100,000). The statistical results obtained show that the technique has great potential confirming the findings of previous studies. However, effectiveness of the technique is only one part of the story. Familial searching has juridical and ethical aspects that should not be ignored. In Switzerland for example, there are no specific guidelines to the legality or otherwise of familial searching. This article both presents statistical results, and addresses criminological and civil liberties aspects to take into account risks and benefits of familial searching.

  8. Image Databases.

    ERIC Educational Resources Information Center

    Pettersson, Rune

    Different kinds of pictorial databases are described with respect to aims, user groups, search possibilities, storage, and distribution. Some specific examples are given for databases used for the following purposes: (1) labor markets for artists; (2) document management; (3) telling a story; (4) preservation (archives and museums); (5) research;…

  9. Searching for coexpressed genes in three-color cDNA microarray data using a probabilistic model-based Hough Transform.

    PubMed

    Tino, Peter; Zhao, Hongya; Yan, Hong

    2011-01-01

    The effects of a drug on the genomic scale can be assessed in a three-color cDNA microarray with the three color intensities represented through the so-called hexaMplot. In our recent study, we have shown that the Hough Transform (HT) applied to the hexaMplot can be used to detect groups of coexpressed genes in the normal-disease-drug samples. However, the standard HT is not well suited for the purpose because 1) the assayed genes need first to be hard-partitioned into equally and differentially expressed genes, with HT ignoring possible information in the former group; 2) the hexaMplot coordinates are negatively correlated and there is no direct way of expressing this in the standard HT and 3) it is not clear how to quantify the association of coexpressed genes with the line along which they cluster. We address these deficiencies by formulating a dedicated probabilistic model-based HT. The approach is demonstrated by assessing effects of the drug Rg1 on homocysteine-treated human umbilical vein endothetial cells. Compared with our previous study, we robustly detect stronger natural groupings of coexpressed genes. Moreover, the gene groups show coherent biological functions with high significance, as detected by the Gene Ontology analysis.

  10. How to prepare a systematic review of economic evaluations for clinical practice guidelines: database selection and search strategy development (part 2/3).

    PubMed

    Thielen, F W; Van Mastrigt, Gapg; Burgers, L T; Bramer, W M; Majoie, Hjm; Evers, Smaa; Kleijnen, J

    2016-12-01

    This article is part of the series "How to prepare a systematic review of economic evaluations (EES) for informing evidence-based healthcare decisions", in which a five-step approach is proposed. Areas covered: This paper focuses on the selection of relevant databases and developing a search strategy for detecting EEs, as well as on how to perform the search and how to extract relevant data from retrieved records. Expert commentary: Thus far, little has been published on how to conduct systematic review EEs. Moreover, reliable sources of information, such as the Health Economic Evaluation Database, have ceased to publish updates. Researchers are thus left without authoritative guidance on how to conduct SR-EEs. Together with van Mastrigt et al. we seek to fill this gap.

  11. Matching unknown empirical formulas to chemical structure using LC/MS TOF accurate mass and database searching: example of unknown pesticides on tomato skins.

    PubMed

    Thurman, E Michael; Ferrer, Imma; Fernández-Alba, Amadeo Rodriguez

    2005-03-04

    Traditionally, the screening of unknown pesticides in food has been accomplished by GC/MS methods using conventional library searching routines. However, many of the new polar and thermally labile pesticides and their degradates are more readily and easily analyzed by LC/MS methods and no searchable libraries currently exist (with the exception of some user libraries, which are limited). Therefore, there is a need for LC/MS approaches to detect unknown non-target pesticides in food. This report develops an identification scheme using a combination of LC/MS time-of-flight (accurate mass) and LC/MS ion trap MS (MS/MS) with searching of empirical formulas generated through accurate mass and a ChemIndex database or Merck Index database. The approach is different than conventional library searching of fragment ions. The concept here consists of four parts. First is the initial detection of a possible unknown pesticide in actual market-place vegetable extracts (tomato skins) using accurate mass and generating empirical formulas. Second is searching either the Merck Index database on CD (10,000 compounds) or the ChemIndex (77,000 compounds) for possible structures. Third is MS/MS of the unknown pesticide in the tomato-skin extract followed by fragment ion identification using chemical drawing software and comparison with accurate-mass ion fragments. Fourth is the verification with authentic standards, if available. Three examples of unknown, non-target pesticides are shown using a tomato-skin extract from an actual market place sample. Limitations of the approach are discussed including the use of A + 2 isotope signatures, extended databases, lack of authentic standards, and natural product unknowns in food extracts.

  12. RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures

    PubMed Central

    2010-01-01

    Background Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB) in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. Description RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics) is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA structures is provided. RNA FRABASE

  13. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System

    PubMed Central

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively. PMID:26568953

  14. Comparing the Precision of Information Retrieval of MeSH-Controlled Vocabulary Search Method and a Visual Method in the Medline Medical Database.

    PubMed

    Hariri, Nadjla; Ravandi, Somayyeh Nadi

    2014-01-01

    Medline is one of the most important databases in the biomedical field. One of the most important hosts for Medline is Elton B. Stephens CO. (EBSCO), which has presented different search methods that can be used based on the needs of the users. Visual search and MeSH-controlled search methods are among the most common methods. The goal of this research was to compare the precision of the retrieved sources in the EBSCO Medline base using MeSH-controlled and visual search methods. This research was a semi-empirical study. By holding training workshops, 70 students of higher education in different educational departments of Kashan University of Medical Sciences were taught MeSH-Controlled and visual search methods in 2012. Then, the precision of 300 searches made by these students was calculated based on Best Precision, Useful Precision, and Objective Precision formulas and analyzed in SPSS software using the independent sample T Test, and three precisions obtained with the three precision formulas were studied for the two search methods. The mean precision of the visual method was greater than that of the MeSH-Controlled search for all three types of precision, i.e. Best Precision, Useful Precision, and Objective Precision, and their mean precisions were significantly different (P <0.001). Sixty-five percent of the researchers indicated that, although the visual method was better than the controlled method, the control of keywords in the controlled method resulted in finding more proper keywords for the searches. Fifty-three percent of the participants in the research also mentioned that the use of the combination of the two methods produced better results. For users, it is more appropriate to use a natural, language-based method, such as the visual method, in the EBSCO Medline host than to use the controlled method, which requires users to use special keywords. The potential reason for their preference was that the visual method allowed them more freedom of action.

  15. Integrating Boolean Queries in Conjunctive Normal Form with Probabilistic Retrieval Models.

    ERIC Educational Resources Information Center

    Losee, Robert M.; Bookstein, Abraham

    1988-01-01

    Presents a model that places Boolean database queries into conjunctive normal form, thereby allowing probabilistic ranking of documents and the incorporation of relevance feedback. Experimental results compare the performance of a sequential learning probabilistic retrieval model with the proposed integrated Boolean probabilistic model and a fuzzy…

  16. Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface

    PubMed Central

    Tate, A Rosemary; Beloff, Natalia; Al-Radwan, Balques; Wickson, Joss; Puri, Shivani; Williams, Timothy; Van Staa, Tjeerd; Bleach, Adrian

    2014-01-01

    Objective UK primary care databases, which contain diagnostic, demographic and prescribing information for millions of patients geographically representative of the UK, represent a significant resource for health services and clinical research. They can be used to identify patients with a specified disease or condition (phenotyping) and to investigate patterns of diagnosis and symptoms. Currently, extracting such information manually is time-consuming and requires considerable expertise. In order to exploit more fully the potential of these large and complex databases, our interdisciplinary team developed generic methods allowing access to different types of user. Materials and methods Using the Clinical Practice Research Datalink database, we have developed an online user-focused system (TrialViz), which enables users interactively to select suitable medical general practices based on two criteria: suitability of the patient base for the intended study (phenotyping) and measures of data quality. Results An end-to-end system, underpinned by an innovative search algorithm, allows the user to extract information in near real-time via an intuitive query interface and to explore this information using interactive visualization tools. A usability evaluation of this system produced positive results. Discussion We present the challenges and results in the development of TrialViz and our plans for its extension for wider applications of clinical research. Conclusions Our fast search algorithms and simple query algorithms represent a significant advance for users of clinical research databases. PMID:24272162

  17. Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface.

    PubMed

    Tate, A Rosemary; Beloff, Natalia; Al-Radwan, Balques; Wickson, Joss; Puri, Shivani; Williams, Timothy; Van Staa, Tjeerd; Bleach, Adrian

    2014-01-01

    UK primary care databases, which contain diagnostic, demographic and prescribing information for millions of patients geographically representative of the UK, represent a significant resource for health services and clinical research. They can be used to identify patients with a specified disease or condition (phenotyping) and to investigate patterns of diagnosis and symptoms. Currently, extracting such information manually is time-consuming and requires considerable expertise. In order to exploit more fully the potential of these large and complex databases, our interdisciplinary team developed generic methods allowing access to different types of user. Using the Clinical Practice Research Datalink database, we have developed an online user-focused system (TrialViz), which enables users interactively to select suitable medical general practices based on two criteria: suitability of the patient base for the intended study (phenotyping) and measures of data quality. An end-to-end system, underpinned by an innovative search algorithm, allows the user to extract information in near real-time via an intuitive query interface and to explore this information using interactive visualization tools. A usability evaluation of this system produced positive results. We present the challenges and results in the development of TrialViz and our plans for its extension for wider applications of clinical research. Our fast search algorithms and simple query algorithms represent a significant advance for users of clinical research databases.

  18. DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT

    EPA Science Inventory

    Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...

  19. DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT

    EPA Science Inventory

    Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...

  20. Savvy Searching.

    ERIC Educational Resources Information Center

    Jacso, Peter

    2002-01-01

    Explains desktop metasearch engines, which search the databases of several search engines simultaneously. Reviews two particular versions, the Copernic 2001 Pro and the BullsEye Pro 3, comparing costs, subject categories, display capabilities, and layout for presenting results. (LRW)

  1. Racial differences in the association between preoperative serum cholesterol and prostate cancer recurrence: results from the SEARCH database

    PubMed Central

    Allott, Emma H.; Howard, Lauren E.; Aronson, William J.; Terris, Martha K.; Kane, Christopher J.; Amling, Christopher L.; Cooperberg, Matthew R.; Freedland, Stephen J.

    2016-01-01

    Background Black men are disproportionately affected by both cardiovascular disease and prostate cancer. Epidemiologic evidence linking dyslipidemia, an established cardiovascular risk factor, and prostate cancer progression is mixed. As existing studies were conducted in predominantly non-black populations, research in black men is lacking. Methods We identified 628 black and 1,020 non-black men who underwent radical prostatectomy and never used statins before surgery in the Shared Equal Access Regional Cancer Hospital (SEARCH) database. Median follow up was 2.9 years. The impact of preoperative hypercholesterolemia on risk of biochemical recurrence was examined using multivariable, race-stratified proportional hazards. In secondary analysis, we examined associations with low-density lipoprotein (LDL), high-density lipoprotein (HDL) and triglycerides, overall and among men with dyslipidemia. Results High cholesterol was associated with increased risk of recurrence in black (HRper10mg/dl 1.06; 95%CI 1.02–1.11) but not non-black men (HRper10mg/dl 0.99; 95%CI 0.95–1.03; p-interaction=0.011). Elevated triglycerides were associated with increased risk in both black and non-black men (HRper10mg/dl 1.02; 95%CI 1.00–1.03 and 1.02; 95%CI 1.00–1.02, respectively; p-interaction=0.458). There were no significant associations between LDL or HDL and recurrence risk in either race. Associations with cholesterol, LDL and triglycerides were similar among men with dyslipidemia, but low HDL was associated with increased risk of recurrence in black, but not non-black men with dyslipidemia (p-interaction=0.047). Conclusion Elevated cholesterol was a risk factor for recurrence in black but not non-black men, whereas high triglycerides were associated with increased risk regardless of race. Impact Significantly contrasting associations by race may provide insight into prostate cancer racial disparities. PMID:26809276

  2. Asking Better Questions: How Presentation Formats Influence Information Search.

    PubMed

    Wu, Charley M; Meder, Björn; Filimon, Flavia; Nelson, Jonathan D

    2017-03-20

    While the influence of presentation formats have been widely studied in Bayesian reasoning tasks, we present the first systematic investigation of how presentation formats influence information search decisions. Four experiments were conducted across different probabilistic environments, where subjects (N = 2,858) chose between 2 possible search queries, each with binary probabilistic outcomes, with the goal of maximizing classification accuracy. We studied 14 different numerical and visual formats for presenting information about the search environment, constructed across 6 design features that have been prominently related to improvements in Bayesian reasoning accuracy (natural frequencies, posteriors, complement, spatial extent, countability, and part-to-whole information). The posterior variants of the icon array and bar graph formats led to the highest proportion of correct responses, and were substantially better than the standard probability format. Results suggest that presenting information in terms of posterior probabilities and visualizing natural frequencies using spatial extent (a perceptual feature) were especially helpful in guiding search decisions, although environments with a mixture of probabilistic and certain outcomes were challenging across all formats. Subjects who made more accurate probability judgments did not perform better on the search task, suggesting that simple decision heuristics may be used to make search decisions without explicitly applying Bayesian inference to compute probabilities. We propose a new take-the-difference (TTD) heuristic that identifies the accuracy-maximizing query without explicit computation of posterior probabilities. (PsycINFO Database Record

  3. Similarity searching in databases of flexible 3D structures using autocorrelation vectors derived from smoothed bounded distance matrices.

    PubMed

    Rhodes, Nicholas; Clark, David E; Willett, Peter

    2006-01-01

    This paper presents an exploratory study of a novel method for flexible 3-D similarity searching based on autocorrelation vectors and smoothed bounded distance matrices. Although the new approach is unable to outperform an existing 2-D similarity searching in terms of enrichment factors, it is able to retrieve different compounds at a given percentage of the hit-list and so may be a useful adjunct to other similarity searching methods.

  4. Learning Probabilistic Logic Models from Probabilistic Examples

    PubMed Central

    Chen, Jianzhong; Muggleton, Stephen; Santos, José

    2009-01-01

    Abstract We revisit an application developed originally using abductive Inductive Logic Programming (ILP) for modeling inhibition in metabolic networks. The example data was derived from studies of the effects of toxins on rats using Nuclear Magnetic Resonance (NMR) time-trace analysis of their biofluids together with background knowledge representing a subset of the Kyoto Encyclopedia of Genes and Genomes (KEGG). We now apply two Probabilistic ILP (PILP) approaches - abductive Stochastic Logic Programs (SLPs) and PRogramming In Statistical modeling (PRISM) to the application. Both approaches support abductive learning and probability predictions. Abductive SLPs are a PILP framework that provides possible worlds semantics to SLPs through abduction. Instead of learning logic models from non-probabilistic examples as done in ILP, the PILP approach applied in this paper is based on a general technique for introducing probability labels within a standard scientific experimental setting involving control and treated data. Our results demonstrate that the PILP approach provides a way of learning probabilistic logic models from probabilistic examples, and the PILP models learned from probabilistic examples lead to a significant decrease in error accompanied by improved insight from the learned results compared with the PILP models learned from non-probabilistic examples. PMID:19888348

  5. CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

    PubMed

    Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C

    2010-12-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

  6. CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.

    PubMed

    Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L

    2010-04-06

    Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models. This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA). A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT) abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD) abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72) times using the optimized SIMT algorithm and up to 1.77 (1.66) times using the partitioned vectorized algorithm, with a performance of up to 17 (30) billion cells update per second (GCUPS) on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295) graphics card. CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  7. HMMER web server: interactive sequence similarity searching.

    PubMed

    Finn, Robert D; Clements, Jody; Eddy, Sean R

    2011-07-01

    HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.

  8. HMMER web server: interactive sequence similarity searching

    PubMed Central

    Finn, Robert D.; Clements, Jody; Eddy, Sean R.

    2011-01-01

    HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them. PMID:21593126

  9. Familial searching: a specialist forensic DNA profiling service utilising the National DNA Database to identify unknown offenders via their relatives--the UK experience.

    PubMed

    Maguire, C N; McCallum, L A; Storey, C; Whitaker, J P

    2014-01-01

    The National DNA Database (NDNAD) of England and Wales was established on April 10th 1995. The NDNAD is governed by a variety of legislative instruments that mean that DNA samples can be taken if an individual is arrested and detained in a police station. The biological samples and the DNA profiles derived from them can be used for purposes related to the prevention and detection of crime, the investigation of an offence and for the conduct of a prosecution. Following the South East Asian Tsunami of December 2004, the legislation was amended to allow the use of the NDNAD to assist in the identification of a deceased person or of a body part where death has occurred from natural causes or from a natural disaster. The UK NDNAD now contains the DNA profiles of approximately 6 million individuals representing 9.6% of the UK population. As the science of DNA profiling advanced, the National DNA Database provided a potential resource for increased intelligence beyond the direct matching for which it was originally created. The familial searching service offered to the police by several UK forensic science providers exploits the size and geographic coverage of the NDNAD and the fact that close relatives of an offender may share a significant proportion of that offender's DNA profile and will often reside in close geographic proximity to him or her. Between 2002 and 2011 Forensic Science Service Ltd. (FSS) provided familial search services to support 188 police investigations, 70 of which are still active cases. This technique, which may be used in serious crime cases or in 'cold case' reviews when there are few or no investigative leads, has led to the identification of 41 perpetrators or suspects. In this paper we discuss the processes, utility, and governance of the familial search service in which the NDNAD is searched for close genetic relatives of an offender who has left DNA evidence at a crime scene, but whose DNA profile is not represented within the NDNAD. We

  10. Searching the Literatura Latino Americana e do Caribe em Ciências da Saúde (LILACS) database improves systematic reviews.

    PubMed

    Clark, Otavio Augusto Camara; Castro, Aldemar Araujo

    2002-02-01

    An unbiased systematic review (SR) should analyse as many articles as possible in order to provide the best evidence available. However, many SR use only databases with high English-language content as sources for articles. Literatura Latino Americana e do Caribe em Ciências da Saúde (LILACS) indexes 670 journals from the Latin American and Caribbean health literature but is seldom used in these SR. Our objective is to evaluate if LILACS should be used as a routine source of articles for SR. First we identified SR published in 1997 in five medical journals with a high impact factor. Then we searched LILACS for articles that could match the inclusion criteria of these SR. We also checked if the authors had already identified these articles located in LILACS. In all, 64 SR were identified. Two had already searched LILACS and were excluded. In 39 of 62 (63%) SR a LILACS search identified articles that matched the inclusion criteria. In 5 (8%) our search was inconclusive and in 18 (29%) no articles were found in LILACS. Therefore, in 71% (44/72) of cases, a LILACS search could have been useful to the authors. This proportion remains the same if we consider only the 37 SR that performed a meta-analysis. In only one case had the article identified in LILACS already been located elsewhere by the authors' strategy. LILACS is an under-explored and unique source of articles whose use can improve the quality of systematic reviews. This database should be used as a routine source to identify studies for systematic reviews.

  11. The Object-analogue approach for probabilistic forecasting

    NASA Astrophysics Data System (ADS)

    Frediani, M. E.; Hopson, T. M.; Anagnostou, E. N.; Hacker, J.

    2015-12-01

    The object-analogue is a new method to estimate forecast uncertainty and to derive probabilistic predictions of gridded forecast fields over larger regions rather than point locations. The method has been developed for improving the forecast of 10-meter wind speed over the northeast US, and it can be extended to other forecast variables, vertical levels, and other regions. The object-analogue approach combines the analog post-processing technique (Hopson 2005; Hamill 2006; Delle Monache 2011) with the Method for Object-based Diagnostic Evaluation (MODE) for forecast verification (Davis et al 2006a, b). Originally, MODE is used to verify mainly precipitation forecasts using features of a forecast region represented by an object. The analog technique is used to reduce the NWP systematic and random errors of a gridded forecast field. In this study we use MODE-derived objects to characterize the wind fields forecasts into attributes such as object area, centroid location, and intensity percentiles, and apply the analogue concept to these objects. The object-analogue method uses a database of objects derived from reforecasts and their respective reanalysis. Given a real-time forecast field, it searches the database and selects the top-ranked objects with the most similar set of attributes using the MODE fuzzy logic algorithm for object matching. The attribute probabilities obtained with the set of selected object-analogues are used to derive a multi-layer probabilistic prediction. The attribute probabilities are combined into three uncertainty layers that address the main concerns of most applications: location, area, and magnitude. The multi-layer uncertainty can be weighted and combined or used independently in such that it provides a more accurate prediction, adjusted according to the application interest. In this study we present preliminary results of the object-analogue method. Using a database with one hundred storms we perform a leave-one-out cross-validation to

  12. Mixed deterministic and probabilistic networks

    PubMed Central

    Dechter, Rina

    2010-01-01

    The paper introduces mixed networks, a new graphical model framework for expressing and reasoning with probabilistic and deterministic information. The motivation to develop mixed networks stems from the desire to fully exploit the deterministic information (constraints) that is often present in graphical models. Several concepts and algorithms specific to belief networks and constraint networks are combined, achieving computational efficiency, semantic coherence and user-interface convenience. We define the semantics and graphical representation of mixed networks, and discuss the two main types of algorithms for processing them: inference-based and search-based. A preliminary experimental evaluation shows the benefits of the new model. PMID:20981243

  13. Reduction in database search space by utilization of amino acid composition information from electron transfer dissociation and higher-energy collisional dissociation mass spectra.

    PubMed

    Hansen, Thomas A; Kryuchkov, Fedor; Kjeldsen, Frank

    2012-08-07

    With high-mass accuracy and consecutively obtained electron transfer dissociation (ETD) and higher-energy collisional dissociation (HCD) tandem mass spectrometry (MS/MS), reliable (≥97%) and sensitive fragment ions have been extracted for identification of specific amino acid residues in peptide sequences. The analytical benefit of these specific amino acid composition (AAC) ions is to restrict the database search space and provide identification of peptides with higher confidence and reduced false negative rates. The 6706 uniquely identified peptide sequences determined with a conservative Mascot score of >30 were used to characterize the AAC ions. The loss of amino acid side chains (small neutral losses, SNLs) from the charge reduced peptide radical cations was studied using ETD. Complementary AAC information from HCD spectra was provided by immonium ions. From the ETD/HCD mass spectra, 5162 and 6720 reliable SNLs and immonium ions were successfully extracted, respectively. Automated application of the AAC information during database searching resulted in an average 3.5-fold higher confidence level of peptide identification. In addition, 4% and 28% more peptides were identified above the significance level in a standard and extended search space, respectively.

  14. Validity and Reliability of a Systematic Database Search Strategy to Identify Publications Resulting From Pharmacy Residency Research Projects.

    PubMed

    Kwak, Namhee; Swan, Joshua T; Thompson-Moore, Nathaniel; Liebl, Michael G

    2016-08-01

    This study aims to develop a systematic search strategy and test its validity and reliability in terms of identifying projects published in peer-reviewed journals as reported by residency graduates through an online survey. This study was a prospective blind comparison to a reference standard. Pharmacy residency projects conducted at the study institution between 2001 and 2012 were included. A step-wise, systematic procedure containing up to 8 search strategies in PubMed and EMBASE for each project was created using the names of authors and abstract keywords. In order to further maximize sensitivity, complex phrases with multiple variations were truncated to the root word. Validity was assessed by obtaining information on publications from an online survey deployed to residency graduates. The search strategy identified 13 publications (93% sensitivity, 100% specificity, and 99% accuracy). Both methods identified a similar proportion achieving publication (19.7% search strategy vs 21.2% survey, P = 1.00). Reliability of the search strategy was affirmed by the perfect agreement between 2 investigators (k = 1.00). This systematic search strategy demonstrated a high sensitivity, specificity, and accuracy for identifying publications resulting from pharmacy residency projects using information available in residency conference abstracts. © The Author(s) 2015.

  15. Comparing the Precision of Information Retrieval of MeSH-Controlled Vocabulary Search Method and a Visual Method in the Medline Medical Database

    PubMed Central

    Hariri, Nadjla; Ravandi, Somayyeh Nadi

    2014-01-01

    Background: Medline is one of the most important databases in the biomedical field. One of the most important hosts for Medline is Elton B. Stephens CO. (EBSCO), which has presented different search methods that can be used based on the needs of the users. Visual search and MeSH-controlled search methods are among the most common methods. The goal of this research was to compare the precision of the retrieved sources in the EBSCO Medline base using MeSH-controlled and visual search methods. Methods: This research was a semi-empirical study. By holding training workshops, 70 students of higher education in different educational departments of Kashan University of Medical Sciences were taught MeSH-Controlled and visual search methods in 2012. Then, the precision of 300 searches made by these students was calculated based on Best Precision, Useful Precision, and Objective Precision formulas and analyzed in SPSS software using the independent sample T Test, and three precisions obtained with the three precision formulas were studied for the two search methods. Results: The mean precision of the visual method was greater than that of the MeSH-Controlled search for all three types of precision, i.e. Best Precision, Useful Precision, and Objective Precision, and their mean precisions were significantly different (P <0.001). Sixty-five percent of the researchers indicated that, although the visual method was better than the controlled method, the control of keywords in the controlled method resulted in finding more proper keywords for the searches. Fifty-three percent of the participants in the research also mentioned that the use of the combination of the two methods produced better results. Conclusion: For users, it is more appropriate to use a natural, language-based method, such as the visual method, in the EBSCO Medline host than to use the controlled method, which requires users to use special keywords. The potential reason for their preference was that the visual

  16. Visualization Tools and Techniques for Search and Validation of Large Earth Science Spatial-Temporal Metadata Databases

    NASA Astrophysics Data System (ADS)

    Baskin, W. E.; Herbert, A.; Kusterer, J.

    2014-12-01

    Spatial-temporal metadata databases are critical components of interactive data discovery services for ordering Earth Science datasets. The development staff at the Atmospheric Science Data Center (ASDC) works closely with satellite Earth Science mission teams such as CERES, CALIPSO, TES, MOPITT, and CATS to create and maintain metadata databases that are tailored to the data discovery needs of the Earth Science community. This presentation focuses on the visualization tools and techniques used by the ASDC software development team for data discovery and validation/optimization of spatial-temporal objects in large multi-mission spatial-temporal metadata databases. The following topics will be addressed: Optimizing the level of detail of spatial temporal metadata to provide interactive spatial query performance over a multi-year Earth Science mission Generating appropriately scaled sensor footprint gridded (raster) metadata from Level1 and Level2 Satellite and Aircraft time-series data granules Performance comparison of raster vs vector spatial granule footprint mask queries in large metadata database and a description of the visualization tools used to assist with this analysis

  17. Identifying Gel-Separated Proteins Using In-Gel Digestion, Mass Spectrometry, and Database Searching: Consider the Chemistry

    ERIC Educational Resources Information Center

    Albright, Jessica C.; Dassenko, David J.; Mohamed, Essa A.; Beussman, Douglas J.

    2009-01-01

    Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry is an important bioanalytical technique in drug discovery, proteomics, and research at the biology-chemistry interface. This is an especially powerful tool when combined with gel separation of proteins and database mining using the mass spectral data. Currently, few hands-on…

  18. Identifying Gel-Separated Proteins Using In-Gel Digestion, Mass Spectrometry, and Database Searching: Consider the Chemistry

    ERIC Educational Resources Information Center

    Albright, Jessica C.; Dassenko, David J.; Mohamed, Essa A.; Beussman, Douglas J.

    2009-01-01

    Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry is an important bioanalytical technique in drug discovery, proteomics, and research at the biology-chemistry interface. This is an especially powerful tool when combined with gel separation of proteins and database mining using the mass spectral data. Currently, few hands-on…

  19. Effect of cleavage enzyme, search algorithm and decoy database on mass spectrometric identification of wheat gluten proteins

    USDA-ARS?s Scientific Manuscript database

    Tandem mass spectrometry (MS/MS) is routinely used to identify proteins by comparing peptide spectra to those generated in silico from protein sequence databases. Wheat storage proteins (gliadins and glutenins) are difficult to distinguish by MS/MS as they have few cleavable tryptic sites, often res...

  20. The Opera del Vocabolario Italiano Database: Full-Text Searching Early Italian Vernacular Sources on the Web.

    ERIC Educational Resources Information Center

    DuPont, Christian

    2001-01-01

    Introduces and describes the functions of the Opera del Vocabolario Italiano (OVI) database, a powerful Web-based, full-text, searchable electronic archive that contains early Italian vernacular texts whose composition may be dated prior to 1375. Examples are drawn from scholars in various disciplines who have employed the OVI in support of their…

  1. The Opera del Vocabolario Italiano Database: Full-Text Searching Early Italian Vernacular Sources on the Web.

    ERIC Educational Resources Information Center

    DuPont, Christian

    2001-01-01

    Introduces and describes the functions of the Opera del Vocabolario Italiano (OVI) database, a powerful Web-based, full-text, searchable electronic archive that contains early Italian vernacular texts whose composition may be dated prior to 1375. Examples are drawn from scholars in various disciplines who have employed the OVI in support of their…

  2. Ten Most Searched Databases by a Business Generalist--Part 1 or A Day in the Life of....

    ERIC Educational Resources Information Center

    Meredith, Meri

    1986-01-01

    Describes databases frequently used in Business Information Center, Cummins Engine Company (Columbus, Indiana): Dun and Bradstreet Business Information Report System, Newsearch, Dun and Bradstreet Market Identifiers, Trade and Industry Index, PTS PROMT, Bureau of Labor Statistics files, ABI/INFORM, Magazine Index, NEXIS, Dow Jones News/Retrieval.…

  3. Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling.

    PubMed

    Choi, Hyungwon; Ghosh, Debashis; Nesvizhskii, Alexey I

    2008-01-01

    Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.

  4. Direct identification of human cellular microRNAs by nanoflow liquid chromatography-high-resolution tandem mass spectrometry and database searching.

    PubMed

    Nakayama, Hiroshi; Yamauchi, Yoshio; Taoka, Masato; Isobe, Toshiaki

    2015-03-03

    MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene networks and participate in many physiological and pathological pathways. To date, miRNAs have been characterized mostly by genetic technologies, which have the advantages of being very sensitive and using high-throughput instrumentation; however, these techniques cannot identify most post-transcriptional modifications of miRNAs that would affect their functions. Herein, we report an analytical system for the direct identification of miRNAs that incorporates nanoflow liquid chromatography-high-resolution tandem mass spectrometry and RNA-sequence database searching. By introducing a spray-assisting device that stabilizes negative nanoelectrospray ionization of RNAs and by searching an miRNA sequence database using the obtained tandem mass spectrometry data for the RNA mixture, we successfully identified femtomole quantities of human cellular miRNAs and their 3'-terminal variants. This is the first report of a fully automated, and thus objective, tandem mass spectrometry-based analytical system that can be used to identify miRNAs.

  5. Proteome analysis of Sorangium cellulosum employing 2D-HPLC-MS/MS and improved database searching strategies for CID and ETD fragment spectra.

    PubMed

    Leinenbach, Andreas; Hartmer, Ralf; Lubeck, Markus; Kneissl, Benny; Elnakady, Yasser A; Baessmann, Carsten; Müller, Rolf; Huber, Christian G

    2009-09-01

    Shotgun proteome analysis of the myxobacterial model strain for secondary metabolite biosynthesis Sorangium cellulosum was performed employing off-line two-dimensional high-pH reversed-phase HPLC x low-pH ion-pair reversed-phase HPLC and dual tandem mass spectrometry with collision-induced dissociation (CID) and electron transfer dissociation (ETD) as complementary fragmentation techniques. Peptide identification using database searching was optimized for ETD fragment spectra to obtain the maximum number of identifications at equivalent false discovery rates (1.0%) in the evaluation of both fragmentation techniques. In the database search of the CID MS/MS data, the mass tolerance was set to the well-established 0.3 Da window, whereas for ETD data, it was widened to 1.1 Da to account for hydrogen-rearrangement in the radical-intermediate of the peptide precursor ion. To achieve a false discovery rate comparable to the CID results, we increased the significance threshold for peptide identification to 0.001 for the ETD data. The ETD based analysis yielded about 74% of all peptides and about 78% of all proteins compared to the CID-method. In the combined data set, 952 proteins of S. cellulosum were confidently identified by at least two peptides per protein, facilitating the study of the function of regulatory proteins in the social myxobacteria and their role in secondary metabolism.

  6. International patent applications for non-injectable naloxone for opioid overdose reversal: Exploratory search and retrieve analysis of the PatentScope database.

    PubMed

    McDonald, Rebecca; Danielsson Glende, Øyvind; Dale, Ola; Strang, John

    2017-06-08

    Non-injectable naloxone formulations are being developed for opioid overdose reversal, but only limited data have been published in the peer-reviewed domain. Through examination of a hitherto-unsearched database, we expand public knowledge of non-injectable formulations, tracing their development and novelty, with the aim to describe and compare their pharmacokinetic properties. (i) The PatentScope database of the World Intellectual Property Organization was searched for relevant English-language patent applications; (ii) Pharmacokinetic data were extracted, collated and analysed; (iii) PubMed was searched using Boolean search query '(nasal OR intranasal OR nose OR buccal OR sublingual) AND naloxone AND pharmacokinetics'. Five hundred and twenty-two PatentScope and 56 PubMed records were identified: three published international patent applications and five peer-reviewed papers were eligible. Pharmacokinetic data were available for intranasal, sublingual, and reference routes. Highly concentrated formulations (10-40 mg mL(-1) ) had been developed and tested. Sublingual bioavailability was very low (1%; relative to intravenous). Non-concentrated intranasal spray (1 mg mL(-1) ; 1 mL per nostril) had low bioavailability (11%). Concentrated intranasal formulations (≥10 mg mL(-1) ) had bioavailability of 21-42% (relative to intravenous) and 26-57% (relative to intramuscular), with peak concentrations (dose-adjusted Cmax  = 0.8-1.7 ng mL(-1) ) reached in 19-30 min (tmax ). Exploratory analysis identified intranasal bioavailability as associated positively with dose and negatively with volume. We find consistent direction of development of intranasal sprays to high-concentration, low-volume formulations with bioavailability in the 20-60% range. These have potential to deliver a therapeutic dose in 0.1 mL volume. [McDonald R, Danielsson Glende Ø, Dale O, Strang J. International patent applications for non-injectable naloxone for opioid overdose reversal

  7. Probabilistic Structural Analysis Program

    NASA Technical Reports Server (NTRS)

    Pai, Shantaram S.; Chamis, Christos C.; Murthy, Pappu L. N.; Stefko, George L.; Riha, David S.; Thacker, Ben H.; Nagpal, Vinod K.; Mital, Subodh K.

    2010-01-01

    NASA/NESSUS 6.2c is a general-purpose, probabilistic analysis program that computes probability of failure and probabilistic sensitivity measures of engineered systems. Because NASA/NESSUS uses highly computationally efficient and accurate analysis techniques, probabilistic solutions can be obtained even for extremely large and complex models. Once the probabilistic response is quantified, the results can be used to support risk-informed decisions regarding reliability for safety-critical and one-of-a-kind systems, as well as for maintaining a level of quality while reducing manufacturing costs for larger-quantity products. NASA/NESSUS has been successfully applied to a diverse range of problems in aerospace, gas turbine engines, biomechanics, pipelines, defense, weaponry, and infrastructure. This program combines state-of-the-art probabilistic algorithms with general-purpose structural analysis and lifting methods to compute the probabilistic response and reliability of engineered structures. Uncertainties in load, material properties, geometry, boundary conditions, and initial conditions can be simulated. The structural analysis methods include non-linear finite-element methods, heat-transfer analysis, polymer/ceramic matrix composite analysis, monolithic (conventional metallic) materials life-prediction methodologies, boundary element methods, and user-written subroutines. Several probabilistic algorithms are available such as the advanced mean value method and the adaptive importance sampling method. NASA/NESSUS 6.2c is structured in a modular format with 15 elements.

  8. More Publications about Databases.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1983-01-01

    Reviews recent publications in online database literature including three newsletters ("Database Update,""Database Alert," and "Information Hotline"), a directory ("Guide to Online Databases"), and a textbook ("Online Reference and Information Retrieval" by Roger C. Palmer). The new "Guide to Searching ONTAP ABI/INFORM" is noted. (EJS)

  9. Construction of an Indonesian herbal constituents database and its use in Random Forest modelling in a search for inhibitors of aldose reductase.

    PubMed

    Naeem, Sadaf; Hylands, Peter; Barlow, David

    2012-02-01

    Data on phytochemical constituents of plants commonly used in traditional Indonesian medicine have been compiled as a computer database. This database (the Indonesian Herbal constituents database, IHD) currently contains details on ∼1,000 compounds found in 33 different plants. For each entry, the IHD gives details of chemical structure, trivial and systematic name, CAS registry number, pharmacology (where known), toxicology (LD(50)), botanical species, the part(s) of the plant(s) where the compounds are found, typical dosage(s) and reference(s). A second database has been also been compiled for plant-derived compounds with known activity against the enzyme, aldose reductase (AR). This database (the aldose reductase inhibitors database, ARID) contains the same details as the IHD, and currently comprises information on 120 different AR inhibitors. Virtual screening of all compounds in the IHD has been performed using Random Forest (RF) modelling, in a search for novel leads active against AR-to provide for new forms of symptomatic relief in diabetic patients. For the RF modelling, a set of simple 2D chemical descriptors were employed to classify all compounds in the combined ARID and IHD databases as either active or inactive as AR inhibitors. The resulting RF models (which gave misclassification rates of 21%) were used to identify putative new AR inhibitors in the IHD, with such compounds being identified as those giving RF scores >0.5 (in each of the three different RF models developed). In vitro assays were subsequently performed for four of the compounds obtained as hits in this in silico screening, to determine their inhibitory activity against human recombinant AR. The two compounds having the highest RF scores (prunetin and ononin) were shown to have the highest activities experimentally (giving ∼58% and ∼52% inhibition at a concentration of 15μM, respectively), while the compounds with lowest RF scores (vanillic acid and cinnamic acid) showed the

  10. Accelerated Profile HMM Searches.

    PubMed

    Eddy, Sean R

    2011-10-01

    Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.

  11. Local image descriptor-based searching framework of usable similar cases in a radiation treatment planning database for stereotactic body radiotherapy

    NASA Astrophysics Data System (ADS)

    Nonaka, Ayumi; Arimura, Hidetaka; Nakamura, Katsumasa; Shioyama, Yoshiyuki; Soufi, Mazen; Magome, Taiki; Honda, Hiroshi; Hirata, Hideki

    2014-03-01

    Radiation treatment planning (RTP) of the stereotactic body radiotherapy (SBRT) was more complex compared with conventional radiotherapy because of using a number of beam directions. We reported that similar planning cases could be helpful for determination of beam directions for treatment planners, who have less experiences of SBRT. The aim of this study was to develop a framework of searching for usable similar cases to an unplanned case in a RTP database based on a local image descriptor. This proposed framework consists of two steps searching and rearrangement. In the first step, the RTP database was searched for 10 cases most similar to object cases based on the shape similarity of two-dimensional lung region at the isocenter plane. In the second step, the 5 most similar cases were selected by using geometric features related to the location, size and shape of the planning target volume, lung and spinal cord. In the third step, the selected 5 cases were rearranged by use of the Euclidean distance of a local image descriptor, which is a similarity index based on the magnitudes and orientations of image gradients within a region of interest around an isocenter. It was assumed that the local image descriptor represents the information around lung tumors related to treatment planning. The cases, which were selected as cases most similar to test cases by the proposed method, were more resemble in terms of the tumor location than those selected by a conventional method. For evaluation of the proposed method, we applied a similar-cases-based beam arrangement method developed in the previous study to the similar cases selected by the proposed method based on a linear registration. The proposed method has the potential to suggest the superior beam-arrangements from the treatment point of view.

  12. JICST Factual Database JICST DNA Database

    NASA Astrophysics Data System (ADS)

    Shirokizawa, Yoshiko; Abe, Atsushi

    Japan Information Center of Science and Technology (JICST) has started the on-line service of DNA database in October 1988. This database is composed of EMBL Nucleotide Sequence Library and Genetic Sequence Data Bank. The authors outline the database system, data items and search commands. Examples of retrieval session are presented.

  13. Data manipulation in heterogeneous databases

    SciTech Connect

    Chatterjee, A.; Segev, A.

    1991-10-01

    Many important information systems applications require access to data stored in multiple heterogeneous databases. This paper examines a problem in inter-database data manipulation within a heterogeneous environment, where conventional techniques are no longer useful. To solve the problem, a broader definition for join operator is proposed. Also, a method to probabilistically estimate the accuracy of the join is discussed.

  14. Systematic review of health literacy in Cochrane database studies on paediatric asthma educational interventions: searching beyond rigorous design.

    PubMed

    Zeni, Mary Beth

    2012-03-01

    The purpose of this study was to evaluate if paediatric asthma educational intervention studies included in the Cochrane Collaboration database incorporated concepts of health literacy. Inclusion criteria were established to identify review categories in the Cochrane Collaboration database specific to paediatric asthma educational interventions. Articles that met the inclusion criteria were selected from the Cochrane Collaboration database in 2010. The health literacy definition from Healthy People 2010 was used to develop a 4-point a priori rating scale to determine the extent a study reported aspects of health literacy in the development of an educational intervention for parents and/or children. Five Cochrane review categories met the inclusion criteria; 75 studies were rated for health literacy content regarding educational interventions with families and children living with asthma. A priori criteria were used for the rating process. While 52 (69%) studies had no information pertaining to health literacy, 23 (31%) reported an aspect of health literacy. Although all studies maintained the rigorous standards of randomized clinical trials, a model of health literacy was not reported regarding the design and implementation of interventions. While a more comprehensive health literacy model for the development of educational interventions with families and children may have been available after the reviewed studies were conducted, general literacy levels still could have been addressed. The findings indicate a need to incorporate health literacy in the design of client-centred educational interventions and in the selection criteria of relevant Cochrane reviews. Inclusion assures that health literacy is as important as randomization and statistical analyses in the research design of educational interventions and may even assure participation of people with literacy challenges. © 2012 The Author. International Journal of Evidence-Based Healthcare © 2012 The Joanna

  15. MapReduce Implementation of a Hybrid Spectral Library-Database Search Method for Large-Scale Peptide Identification

    SciTech Connect

    Kalyanaraman, Anantharaman; Cannon, William R.; Latt, Benjamin K.; Baxter, Douglas J.

    2011-11-01

    A MapReduce-based implementation called MR- MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs.

  16. Search for 5'-leader regulatory RNA structures based on gene annotation aided by the RiboGap database.

    PubMed

    Naghdi, Mohammad Reza; Smail, Katia; Wang, Joy X; Wade, Fallou; Breaker, Ronald R; Perreault, Jonathan

    2017-03-15

    The discovery of noncoding RNAs (ncRNAs) and their importance for gene regulation led us to develop bioinformatics tools to pursue the discovery of novel ncRNAs. Finding ncRNAs de novo is challenging, first due to the difficulty of retrieving large numbers of sequences for given gene activities, and second due to exponential demands on calculation needed for comparative genomics on a large scale. Recently, several tools for the prediction of conserved RNA secondary structure were developed, but many of them are not designed to uncover new ncRNAs, or are too slow for conducting analyses on a large scale. Here we present various approaches using the database RiboGap as a primary tool for finding known ncRNAs and for uncovering simple sequence motifs with regulatory roles. This database also can be used to easily extract intergenic sequences of eubacteria and archaea to find conserved RNA structures upstream of given genes. We also show how to extend analysis further to choose the best candidate ncRNAs for experimental validation.

  17. MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification.

    PubMed

    Kalyanaraman, Ananth; Cannon, William R; Latt, Benjamin; Baxter, Douglas J

    2011-11-01

    A MapReduce-based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs. The source code along with user documentation are available on http://compbio.eecs.wsu.edu/MR-MSPolygraph. ananth@eecs.wsu.edu; william.cannon@pnnl.gov. Supplementary data are available at Bioinformatics online.

  18. Comparing the Hematopoetic Syndrome Time Course in the NHP Animal Model to Radiation Accident Cases From the Database Search.

    PubMed

    Graessle, Dieter H; Dörr, Harald; Bennett, Alexander; Shapiro, Alla; Farese, Ann M; MacVittie, Thomas J; Meineke, Viktor

    2015-11-01

    Since controlled clinical studies on drug administration for the acute radiation syndrome are lacking, clinical data of human radiation accident victims as well as experimental animal models are the main sources of information. This leads to the question of how to compare and link clinical observations collected after human radiation accidents with experimental observations in non-human primate (NHP) models. Using the example of granulocyte counts in the peripheral blood following radiation exposure, approaches for adaptation between NHP and patient databases on data comparison and transformation are introduced. As a substitute for studying the effects of administration of granulocyte-colony stimulating factor (G-CSF) in human clinical trials, the method of mathematical modeling is suggested using the example of G-CSF administration to NHP after total body irradiation.

  19. Use of Probabilistic Topic Models for Search

    DTIC Science & Technology

    2009-09-01

    people, whose advice was crucial for my research, would be too long to fit in here. A few people, however, shall be named here. I thank my advisors Prof...probability of a word oc- curring in a document is not well explained by a single parametric distribution. A mixture model attempts to fit to the document a...cupied tables over all restaurants as the sum of M normal random variables. A better fit that also works for smaller number of words and/or concentration

  20. Probabilistic Search on Optimized Graph Topologies

    DTIC Science & Technology

    2011-09-01

    where I am today if it were not for your love and support. I am truly grateful and blessed by you each and every day. In memory of Major Thomas Tholi...adding two edges with optimal value for λ2. This neither in- cludes the edge e1,10 nor e2,9 witch are the optimal solutions for adding one edge. Figure

  1. Chemical and biological warfare: Protection, decontamination, and disposal. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1997-11-01

    The bibliography contains citations concerning the means to defend against chemical and biological agents used in military operations, and to eliminate the effects of such agents on personnel, equipment, and grounds. Protection is accomplished through protective clothing and masks, and in buildings and shelters through filtration. Elimination of effects includes decontamination and removal of the agents from clothing, equipment, buildings, grounds, and water, using chemical deactivation, incineration, and controlled disposal of material in injection wells and ocean dumping. Other Published Searches in this series cover chemical warfare detection; defoliants; general studies; biochemistry and therapy; and biology, chemistry, and toxicology associated with chemical warfare agents.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  2. Robots for hazardous duties: Military, space, and nuclear facility applications. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1993-09-01

    The bibliography contains citations concerning the design and application of robots used in place of humans where the environment could be hazardous. Military applications include autonomous land vehicles, robotic howitzers, and battlefield support operations. Space operations include docking, maintenance, mission support, and intra-vehicular and extra-vehicular activities. Nuclear applications include operations within the containment vessel, radioactive waste operations, fueling operations, and plant security. Many of the articles reference control techniques and the use of expert systems in robotic operations. Applications involving industrial manufacturing, walking robots, and robot welding are cited in other published searches in this series. (Contains a minimum of 183 citations and includes a subject term index and title list.)

  3. Evaluation of the purity of recombinant proteins and detection of residual protein contaminants via N-terminal microsequencing and database searching.

    PubMed

    Jin, D Y; Zhang, Z Q; Zhou, Y A; Hou, Y D

    1991-01-01

    The N-terminal amino acid sequences of purified recombinant human gamma-interferon, alpha 2a-interferon and interleukin-2 expressed in E. coli were determined on an Applied Biosystems 477A Protein/Peptide Sequencer and 120A PTH Amino Acid Analyzer. From the raw chromatographic data of these samples, the identity, heterogeneity, amount of methionine-plus species remaining in the final products, and the probable process contaminants were evaluated with the help of computer methods including database searching. General methods to characterize trace contaminants in protein samples were also discussed. Among the sequenced samples, only gamma-interferon was shown to be N-terminal homogeneous. Methionine-containing species were found in interleukin-2 and alpha 2a-interferon. Chicken eggwhite lysozyme was detected in very small amounts in one batch of samples. These results provide valuable information for the development and improvement of preparation methods as well as regulatory responses to recombinant products.

  4. Detection and Identification of Heme c-Modified Peptides by Histidine Affinity Chromatography, High-Performance Liquid Chromatography-Mass Spectrometry, and Database Searching

    SciTech Connect

    Merkley, Eric D.; Anderson, Brian J.; Park, Jea H.; Belchik, Sara M.; Shi, Liang; Monroe, Matthew E.; Smith, Richard D.; Lipton, Mary S.

    2012-12-07

    Multiheme c-type cytochromes (proteins with covalently attached heme c moieties) play important roles in extracellular metal respiration in dissimilatory metal-reducing bacteria. Liquid chromatography-tandem mass spectrometry-(LC-MS/MS) characterization of c-type cytochromes is hindered by the presence of multiple heme groups, since the heme c modified peptides are typically not observed, or if observed, not identified. Using a recently reported histidine affinity chromatography (HAC) procedure, we enriched heme c tryptic peptides from purified bovine heart cytochrome c, a bacterial decaheme cytochrome, and subjected these samples to LC-MS/MS analysis. Enriched bovine cytochrome c samples yielded three- to six-fold more confident peptide-spectrum matches to heme-c containing peptides than unenriched digests. In unenriched digests of the decaheme cytochrome MtoA from Sideroxydans lithotrophicus ES-1, heme c peptides for four of the ten expected sites were observed by LC-MS/MS; following HAC fractionation, peptides covering nine out of ten sites were obtained. Heme c peptide spiked into E. coli lysates at mass ratios as low as 10-4 was detected with good signal-to-noise after HAC and LC-MS/MS analysis. In addition to HAC, we have developed a proteomics database search strategy that takes into account the unique physicochemical properties of heme c peptides. The results suggest that accounting for the double thioether link between heme c and peptide, and the use of the labile heme fragment as a reporter ion, can improve database searching results. The combination of affinity chromatography and heme-specific informatics yielded increases in the number of peptide-spectrum matches of 20-100-fold for bovine cytochrome c.

  5. Detection and identification of heme c-modified peptides by histidine affinity chromatography, high-performance liquid chromatography-mass spectrometry, and database searching.

    PubMed

    Merkley, Eric D; Anderson, Brian J; Park, Jea; Belchik, Sara M; Shi, Liang; Monroe, Matthew E; Smith, Richard D; Lipton, Mary S

    2012-12-07

    Multiheme c-type cytochromes (proteins with covalently attached heme c moieties) play important roles in extracellular metal respiration in dissimilatory metal-reducing bacteria. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) characterization of c-type cytochromes is hindered by the presence of multiple heme groups, since the heme c modified peptides are typically not observed or, if observed, not identified. Using a recently reported histidine affinity chromatography (HAC) procedure, we enriched heme c tryptic peptides from purified bovine heart cytochrome c, two bacterial decaheme cytochromes, and subjected these samples to LC-MS/MS analysis. Enriched bovine cytochrome c samples yielded 3- to 6-fold more confident peptide-spectrum matches to heme c containing peptides than unenriched digests. In unenriched digests of the decaheme cytochrome MtoA from Sideroxydans lithotrophicus ES-1, heme c peptides for 4 of the 10 expected sites were observed by LC-MS/MS; following HAC fractionation, peptides covering 9 out of 10 sites were obtained. Heme c peptide spiked into E. coli lysates at mass ratios as low as 1×10(-4) was detected with good signal-to-noise after HAC and LC-MS/MS analysis. In addition to HAC, we have developed a proteomics database search strategy that takes into account the unique physicochemical properties of heme c peptides. The results suggest that accounting for the double thioether link between heme c and peptide, and the use of the labile heme fragment as a reporter ion, can improve database searching results. The combination of affinity chromatography and heme-specific informatics yielded increases in the number of peptide-spectrum matches of 20-100-fold for bovine cytochrome c.

  6. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  7. Quality Control of Biomedicinal Allergen Products – Highly Complex Isoallergen Composition Challenges Standard MS Database Search and Requires Manual Data Analyses

    PubMed Central

    Spiric, Jelena; Engin, Anna M.; Karas, Michael; Reuter, Andreas

    2015-01-01

    Allergy against birch pollen is among the most common causes of spring pollinosis in Europe and is diagnosed and treated using extracts from natural sources. Quality control is crucial for safe and effective diagnosis and treatment. However, current methods are very difficult to standardize and do not address individual allergen or isoallergen composition. MS provides information regarding selected proteins or the entire proteome and could overcome the aforementioned limitations. We studied the proteome of birch pollen, focusing on allergens and isoallergens, to clarify which of the 93 published sequence variants of the major allergen, Bet v 1, are expressed as proteins within one source material in parallel. The unexpectedly complex Bet v 1 isoallergen composition required manual data interpretation and a specific design of databases, as current database search engines fail to unambiguously assign spectra to highly homologous, partially identical proteins. We identified 47 non-allergenic proteins and all 5 known birch pollen allergens, and unambiguously proved the existence of 18 Bet v 1 isoallergens and variants by manual data analysis. This highly complex isoallergen composition raises questions whether isoallergens can be ignored or must be included for the quality control of allergen products, and which data analysis strategies are to be applied. PMID:26561299

  8. Integration of an Evidence Base into a Probabilistic Risk Assessment Model. The Integrated Medical Model Database: An Organized Evidence Base for Assessing In-Flight Crew Health Risk and System Design

    NASA Technical Reports Server (NTRS)

    Saile, Lynn; Lopez, Vilma; Bickham, Grandin; FreiredeCarvalho, Mary; Kerstman, Eric; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei

    2011-01-01

    This slide presentation reviews the Integrated Medical Model (IMM) database, which is an organized evidence base for assessing in-flight crew health risk. The database is a relational database accessible to many people. The database quantifies the model inputs by a ranking based on the highest value of the data as Level of Evidence (LOE) and the quality of evidence (QOE) score that provides an assessment of the evidence base for each medical condition. The IMM evidence base has already been able to provide invaluable information for designers, and for other uses.

  9. Discovery of novel aldose reductase inhibitors using a protein structure-based approach: 3D-database search followed by design and synthesis.

    PubMed

    Iwata, Y; Arisawa, M; Hamada, R; Kita, Y; Mizutani, M Y; Tomioka, N; Itai, A; Miyamoto, S

    2001-05-24

    Aldose reductase (AR) has been implicated in the etiology of diabetic complications. Due to the limited number of currently available drugs for the treatment of diabetic complications, we have carried out structure-based drug design and synthesis in an attempt to find new types of AR inhibitors. With the ADAM&EVE program, a three-dimensional database (ACD3D) was searched using the ligand binding site of the AR crystal structure. Out of 179 compounds selected through this search followed by visual inspection, 36 compounds were purchased and subjected to a biological assay. Ten compounds showed more than 40% inhibition of AR at a 15 microg/mL concentration. In a subsequent lead optimization, a series of analogues of the most active compound were synthesized based on the docking mode derived by ADAM&EVE. Many of these congeners exhibited higher activities compared to the mother compound. Indeed, the most potent, synthesized compound showed an approximately 20-fold increase in inhibitory activity (IC(50) = 0.21 vs 4.3 microM). Furthermore, a hydrophobic subsite was newly inferred, which would be useful for the design of inhibitors with improved affinity for AR.

  10. High-throughput database search and large-scale negative polarity liquid chromatography-tandem mass spectrometry with ultraviolet photodissociation for complex proteomic samples.

    PubMed

    Madsen, James A; Xu, Hua; Robinson, Michelle R; Horton, Andrew P; Shaw, Jared B; Giles, David K; Kaoud, Tamer S; Dalby, Kevin N; Trent, M Stephen; Brodbelt, Jennifer S

    2013-09-01

    The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS(1) and MS(2) data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of

  11. High-throughput Database Search and Large-scale Negative Polarity Liquid Chromatography–Tandem Mass Spectrometry with Ultraviolet Photodissociation for Complex Proteomic Samples*

    PubMed Central

    Madsen, James A.; Xu, Hua; Robinson, Michelle R.; Horton, Andrew P.; Shaw, Jared B.; Giles, David K.; Kaoud, Tamer S.; Dalby, Kevin N.; Trent, M. Stephen; Brodbelt, Jennifer S.

    2013-01-01

    The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS1 and MS2 data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of

  12. VIEWCACHE: An incremental database access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, Nick; Sellis, Timoleon

    1991-01-01

    The objective is to illustrate the concept of incremental access to distributed databases. An experimental database management system, ADMS, which has been developed at the University of Maryland, in College Park, uses VIEWCACHE, a database access method based on incremental search. VIEWCACHE is a pointer-based access method that provides a uniform interface for accessing distributed databases and catalogues. The compactness of the pointer structures formed during database browsing and the incremental access method allow the user to search and do inter-database cross-referencing with no actual data movement between database sites. Once the search is complete, the set of collected pointers pointing to the desired data are dereferenced.

  13. VIEWCACHE: An incremental database access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, Nick; Sellis, Timoleon

    1991-01-01

    The objective is to illustrate the concept of incremental access to distributed databases. An experimental database management system, ADMS, which has been developed at the University of Maryland, in College Park, uses VIEWCACHE, a database access method based on incremental search. VIEWCACHE is a pointer-based access method that provides a uniform interface for accessing distributed databases and catalogues. The compactness of the pointer structures formed during database browsing and the incremental access method allow the user to search and do inter-database cross-referencing with no actual data movement between database sites. Once the search is complete, the set of collected pointers pointing to the desired data are dereferenced.

  14. Estimated blood loss as a predictor of PSA recurrence after radical prostatectomy: results from the SEARCH database.

    PubMed

    Lloyd, Jessica C; Bañez, Lionel L; Aronson, William J; Terris, Martha K; Presti, Joseph C; Amling, Christopher L; Kane, Christopher J; Freedland, Stephen J

    2010-02-01

    Diagnosis (exploratory cohort). 2b. To clarify the relationship between estimated blood loss (EBL) and biochemical recurrence, assessed by prostate-specific antigen (PSA) level, as blood loss is a long-standing concern associated with radical prostatectomy (RP), and no studies to date have examined the association between blood loss and cancer control. In all, 1077 patients were identified in the Shared Equal-Access Regional Cancer Hospital database who underwent retropubic RP (between 1998 and 2008) and had EBL and follow-up data available. We examined the relationship between EBL and recurrence using multivariate Cox regression analyses. Increased EBL was correlated with PSA recurrence in a multivariate-adjusted model (P = 0.01). When analysed by 500-mL EBL categories, those with an EBL of <1500 mL had a similar risk of recurrence. However, the risk of PSA recurrence tended to increase for an EBL of 1500-3499 mL, before decreasing again for patients with an EBL of > or =3500 mL. Men with an EBL of 2500-3499 mL had more than twice the risk of recurrence than men with an EBL of <1500 mL (P = 0.02). EBL was not associated with adverse tumour stage, grade or margin status. There was a significant correlation between EBL at the time of RP and biochemical recurrence. We hypothesized that this association might be due to transfusion-related immunosuppression, excessive blood obscuring the operative field, EBL being a marker of aggressive disease, or EBL being a marker of poor surgical technique. However, our data did not completely fit any one of these hypotheses, and thus the ultimate cause for the increased risk of recurrence remains unclear and requires further study.

  15. Obesity, prostate-specific antigen nadir, and biochemical recurrence after radical prostatectomy: biology or technique? Results from the SEARCH database.

    PubMed

    Ho, Tammy; Gerber, Leah; Aronson, William J; Terris, Martha K; Presti, Joseph C; Kane, Christopher J; Amling, Christopher L; Freedland, Stephen J

    2012-11-01

    Obesity is associated with an increased risk of biochemical recurrence (BCR) after radical prostatectomy (RP). It is unclear whether this is due to technical challenges related to operating on obese men or other biologic factors. To examine whether obesity predicts higher prostate-specific antigen (PSA) nadir (as a measure of residual PSA-producing tissue) after RP and if this accounts for the greater BCR risk in obese men. A retrospective analysis of 1038 RP patients from 2001 to 2010 in the multicenter US Veterans Administration-based Shared Equal Access Regional Cancer Hospital database with median follow-up of 41 mo. All patients underwent RP. We evaluated the relationship between body mass index (BMI) and ultrasensitive PSA nadir within 6 mo after RP. Adjusted proportional hazards models were used to examine the association between BMI and BCR with and without PSA nadir. Mean BMI was 28.5 kg/m2. Higher BMI was associated with higher PSA nadir on both univariable (p=0.001) and multivariable analyses (p<0.001). Increased BMI was associated with increased BCR risk (hazard ratio [HR]: 1.06; p=0.007). Adjusting for PSA nadir slightly attenuated, but did not eliminate, this association (HR: 1.04, p=0.043). When stratified by PSA nadir, obesity only significantly predicted BCR in men with an undetectable nadir (p=0.006). Unfortunately, other clinically relevant end points such as metastasis or mortality were not available. Obese men are more likely to have a higher PSA nadir, suggesting that either more advanced disease or technical issues confound an ideal operation. However, even after adjusting for the increased PSA nadir, obesity remained predictive of BCR, suggesting that tumors in obese men are growing faster. This provides further support for the idea that obesity is biologically associated with prostate cancer progression. Published by Elsevier B.V.

  16. Obesity, Prostate-Specific Antigen Nadir, and Biochemical Recurrence After Radical Prostatectomy: Biology or Technique? Results from the SEARCH Database

    PubMed Central

    Ho, Tammy; Gerber, Leah; Aronson, William J.; Terris, Martha K.; Presti, Joseph C.; Kane, Christopher J.; Amling, Christopher L.; Freedland, Stephen J.

    2012-01-01

    Background Obesity is associated with an increased risk of biochemical recurrence (BCR) after radical prostatectomy (RP). It is unclear whether this is due to technical challenges related to operating on obese men or other biologic factors. Objective To examine whether obesity predicts higher prostate-specific antigen (PSA) nadir (as a measure of residual PSA-producing tissue) after RP and if this accounts for the greater BCR risk in obese men. Design, setting, and participants A retrospective analysis of 1038 RP patients from 2001 to 2010 in the multicenter US Veterans Administration–based Shared Equal Access Regional Cancer Hospital database with median follow-up of 41 mo. Intervention All patients underwent RP. Outcome measurements and statistical analysis We evaluated the relationship between body mass index (BMI) and ultrasensitive PSA nadir within 6 mo after RP. Adjusted proportional hazards models were used to examine the association between BMI and BCR with and without PSA nadir. Results and limitations Mean BMI was 28.5 kg/m2. Higher BMI was associated with higher PSA nadir on both univariable (p = 0.001) and multivariable analyses (p < 0.001). Increased BMI was associated with increased BCR risk (hazard ratio [HR]: 1.06; p = 0.007). Adjusting for PSA nadir slightly attenuated, but did not eliminate, this association (HR: 1.04, p = 0.043). When stratified by PSA nadir, obesity only significantly predicted BCR in men with an undetectable nadir (p = 0.006). Unfortunately, other clinically relevant end points such as metastasis or mortality were not available. Conclusions Obese men are more likely to have a higher PSA nadir, suggesting that either more advanced disease or technical issues confound an ideal operation. However, even after adjusting for the increased PSA nadir, obesity remained predictive of BCR, suggesting that tumors in obese men are growing faster. This provides further support for the idea that obesity is biologically associated with

  17. [Cognitive Development in Children with Benign Rolandic Epilepsy of Childhood with Centrotemporal Spikes - Results of a Current Systematic Database Search].

    PubMed

    Neumann, H; Helmke, F; Thiels, C; Polster, T; Selzer, L M; Daseking, M; Petermann, F; Lücke, T

    2016-10-01

    Benign Rolandic Epilepsy (BRE) is one of the most common epilepsy syndromes in childhood. Although global intellectual performance is typically normal in BRE-patients, problems were found in specific cognitive domains. To summarize recent empirical findings concerning cognitive development in children with BRE a systematic literature search of clinical studies published between 2009 and 2015 was performed. 19 studies of relevance were found.In most recent studies children with BRE consistently showed general intellectual performance within the normal range. However, in two of the studies patients showed a significantly poorer (but still normal) performance in comparison to controls. The studies provide clear indications for a high prevalence of impairments in language (10 out of 12 studies) and academic performance (6 out of 8 studies) in children with BRE. Regarding deficits in other cognitive domains (attention, memory, visual/auditory perception, executive functions) current findings are inconsistent. In addition, no clear results are found in studies examining cognitive development after remission of BRE. Studies on the relationship between selected clinical/electroencephalographic characteristics (e. g. EEG-patterns, focus lateralization) and cognitive performance and studies on potential benefits of anti-epileptic therapy for cognitive functions also have not yielded consistent results. Studies using fMRI and evoked potentials provide evidence for functional reorganization of neural networks in BRE.Due to the developmental risks in children with BRE early cognitive assessment, early treatment and follow-up assessments are important. © Georg Thieme Verlag KG Stuttgart · New York.

  18. Probabilistic boundary element method

    NASA Technical Reports Server (NTRS)

    Cruse, T. A.; Raveendra, S. T.

    1989-01-01

    The purpose of the Probabilistic Structural Analysis Method (PSAM) project is to develop structural analysis capabilities for the design analysis of advanced space propulsion system hardware. The boundary element method (BEM) is used as the basis of the Probabilistic Advanced Analysis Methods (PADAM) which is discussed. The probabilistic BEM code (PBEM) is used to obtain the structural response and sensitivity results to a set of random variables. As such, PBEM performs analogous to other structural analysis codes such as finite elements in the PSAM system. For linear problems, unlike the finite element method (FEM), the BEM governing equations are written at the boundary of the body only, thus, the method eliminates the need to model the volume of the body. However, for general body force problems, a direct condensation of the governing equations to the boundary of the body is not possible and therefore volume modeling is generally required.

  19. Formalizing Probabilistic Safety Claims

    NASA Technical Reports Server (NTRS)

    Herencia-Zapana, Heber; Hagen, George E.; Narkawicz, Anthony J.

    2011-01-01

    A safety claim for a system is a statement that the system, which is subject to hazardous conditions, satisfies a given set of properties. Following work by John Rushby and Bev Littlewood, this paper presents a mathematical framework that can be used to state and formally prove probabilistic safety claims. It also enables hazardous conditions, their uncertainties, and their interactions to be integrated into the safety claim. This framework provides a formal description of the probabilistic composition of an arbitrary number of hazardous conditions and their effects on system behavior. An example is given of a probabilistic safety claim for a conflict detection algorithm for aircraft in a 2D airspace. The motivation for developing this mathematical framework is that it can be used in an automated theorem prover to formally verify safety claims.

  20. Probabilistic Composite Design

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    1997-01-01

    Probabilistic composite design is described in terms of a computational simulation. This simulation tracks probabilistically the composite design evolution from constituent materials, fabrication process, through composite mechanics and structural components. Comparisons with experimental data are provided to illustrate selection of probabilistic design allowables, test methods/specimen guidelines, and identification of in situ versus pristine strength, For example, results show that: in situ fiber tensile strength is 90% of its pristine strength; flat-wise long-tapered specimens are most suitable for setting ply tensile strength allowables: a composite radome can be designed with a reliability of 0.999999; and laminate fatigue exhibits wide-spread scatter at 90% cyclic-stress to static-strength ratios.

  1. Probabilistic liquefaction triggering based on the cone penetration test

    USGS Publications Warehouse

    Moss, R.E.S.; Seed, R.B.; Kayen, R.E.; Stewart, J.P.; Tokimatsu, K.

    2005-01-01

    Performance-based earthquake engineering requires a probabilistic treatment of potential failure modes in order to accurately quantify the overall stability of the system. This paper is a summary of the application portions of the probabilistic liquefaction triggering correlations proposed recently proposed by Moss and co-workers. To enable probabilistic treatment of liquefaction triggering, the variables comprising the seismic load and the liquefaction resistance were treated as inherently uncertain. Supporting data from an extensive Cone Penetration Test (CPT)-based liquefaction case history database were used to develop a probabilistic correlation. The methods used to measure the uncertainty of the load and resistance variables, how the interactions of these variables were treated using Bayesian updating, and how reliability analysis was applied to produce curves of equal probability of liquefaction are presented. The normalization for effective overburden stress, the magnitude correlated duration weighting factor, and the non-linear shear mass participation factor used are also discussed.

  2. Probabilistic Causation without Probability.

    ERIC Educational Resources Information Center

    Holland, Paul W.

    The failure of Hume's "constant conjunction" to describe apparently causal relations in science and everyday life has led to various "probabilistic" theories of causation of which the study by P. C. Suppes (1970) is an important example. A formal model that was developed for the analysis of comparative agricultural experiments…

  3. Probabilistic composite micromechanics

    NASA Technical Reports Server (NTRS)

    Stock, T. A.; Bellini, P. X.; Murthy, P. L. N.; Chamis, C. C.

    1988-01-01

    Probabilistic composite micromechanics methods are developed that simulate expected uncertainties in unidirectional fiber composite properties. These methods are in the form of computational procedures using Monte Carlo simulation. A graphite/epoxy unidirectional composite (ply) is studied to demonstrate fiber composite material properties at the micro level. Regression results are presented to show the relative correlation between predicted and response variables in the study.

  4. Probabilistic simple sticker systems

    NASA Astrophysics Data System (ADS)

    Selvarajoo, Mathuri; Heng, Fong Wan; Sarmin, Nor Haniza; Turaev, Sherzod

    2017-04-01

    A model for DNA computing using the recombination behavior of DNA molecules, known as a sticker system, was introduced by by L. Kari, G. Paun, G. Rozenberg, A. Salomaa, and S. Yu in the paper entitled DNA computing, sticker systems and universality from the journal of Acta Informatica vol. 35, pp. 401-420 in the year 1998. A sticker system uses the Watson-Crick complementary feature of DNA molecules: starting from the incomplete double stranded sequences, and iteratively using sticking operations until a complete double stranded sequence is obtained. It is known that sticker systems with finite sets of axioms and sticker rules generate only regular languages. Hence, different types of restrictions have been considered to increase the computational power of sticker systems. Recently, a variant of restricted sticker systems, called probabilistic sticker systems, has been introduced [4]. In this variant, the probabilities are initially associated with the axioms, and the probability of a generated string is computed by multiplying the probabilities of all occurrences of the initial strings in the computation of the string. Strings for the language are selected according to some probabilistic requirements. In this paper, we study fundamental properties of probabilistic simple sticker systems. We prove that the probabilistic enhancement increases the computational power of simple sticker systems.

  5. Probabilistic composite micromechanics

    NASA Technical Reports Server (NTRS)

    Stock, T. A.; Bellini, P. X.; Murthy, P. L. N.; Chamis, C. C.

    1988-01-01

    Probabilistic composite micromechanics methods are developed that simulate expected uncertainties in unidirectional fiber composite properties. These methods are in the form of computational procedures using Monte Carlo simulation. A graphite/epoxy unidirectional composite (ply) is studied to demonstrate fiber composite material properties at the micro level. Regression results are presented to show the relative correlation between predicted and response variables in the study.

  6. Probabilistic Threshold Criterion

    SciTech Connect

    Gresshoff, M; Hrousis, C A

    2010-03-09

    The Probabilistic Shock Threshold Criterion (PSTC) Project at LLNL develops phenomenological criteria for estimating safety or performance margin on high explosive (HE) initiation in the shock initiation regime, creating tools for safety assessment and design of initiation systems and HE trains in general. Until recently, there has been little foundation for probabilistic assessment of HE initiation scenarios. This work attempts to use probabilistic information that is available from both historic and ongoing tests to develop a basis for such assessment. Current PSTC approaches start with the functional form of the James Initiation Criterion as a backbone, and generalize to include varying areas of initiation and provide a probabilistic response based on test data for 1.8 g/cc (Ultrafine) 1,3,5-triamino-2,4,6-trinitrobenzene (TATB) and LX-17 (92.5% TATB, 7.5% Kel-F 800 binder). Application of the PSTC methodology is presented investigating the safety and performance of a flying plate detonator and the margin of an Ultrafine TATB booster initiating LX-17.

  7. A natural history of weight change in men with prostate cancer on androgen-deprivation therapy (ADT): results from the Shared Equal Access Regional Cancer Hospital (SEARCH) database

    PubMed Central

    Kim, Howard S.; Moreira, Daniel M.; Smith, Matthew R.; Presti, Joseph C.; Aronson, William J.; Terris, Martha K.; Kane, Christopher J.; Amling, Christopher L.; Freedland, Stephen J.

    2011-01-01

    OBJECTIVE To better understand the natural history of weight change with androgen-deprivation therapy (ADT), we investigated the effect of ADT on body weight among men from the Shared Equal Access Regional Cancer Hospital (SEARCH) database.Men undergoing ADT lose lean muscle but gain fat mass, contributing to an overall gain in weight. PATIENTS AND METHODS We identified 132 men in SEARCH who received ADT after radical prostatectomy.‘Weight change’ was defined as the difference in weight before starting ADT (6 months before ADT) and the on-ADT weight (between 6 and 18 months after starting ADT).In a subanalysis, baseline characteristics of weight-gainers and -losers were analysed using univariate and multivariate analysis to test association with weight change. RESULTS In all, 92 men (70%) gained weight, and 40 (30%) either lost or maintained a stable weight.On average, weight on ADT was 2.2 kg higher than the weight before ADT, with the mean change for weight-gainers and -losers being +4.2 kg and −2.4 kg, respectively.This compared with no significant weight change in the year before starting ADT (paired t-test, change −0.7 kg, P = 0.19) or in the second year on ADT (paired t-test, change −0.5 kg, P = 0.46) for 84 men in whom these additional weight values were recorded.There was no significant association between any of the features examined and weight change on univariate and multivariate analysis. CONCLUSIONS In this longitudinal study, ADT was accompanied by significant weight gain (+2.2 kg). This change occurred primarily in the first year of therapy, with men neither losing nor gaining additional weight thereafter. PMID:20860651

  8. Design of a bioactive small molecule that targets the myotonic dystrophy type 1 RNA via an RNA motif-ligand database and chemical similarity searching.

    PubMed

    Parkesh, Raman; Childs-Disney, Jessica L; Nakamori, Masayuki; Kumar, Amit; Wang, Eric; Wang, Thomas; Hoskins, Jason; Tran, Tuan; Housman, David; Thornton, Charles A; Disney, Matthew D

    2012-03-14

    Myotonic dystrophy type 1 (DM1) is a triplet repeating disorder caused by expanded CTG repeats in the 3'-untranslated region of the dystrophia myotonica protein kinase (DMPK) gene. The transcribed repeats fold into an RNA hairpin with multiple copies of a 5'CUG/3'GUC motif that binds the RNA splicing regulator muscleblind-like 1 protein (MBNL1). Sequestration of MBNL1 by expanded r(CUG) repeats causes splicing defects in a subset of pre-mRNAs including the insulin receptor, the muscle-specific chloride ion channel, sarco(endo)plasmic reticulum Ca(2+) ATPase 1, and cardiac troponin T. Based on these observations, the development of small-molecule ligands that target specifically expanded DM1 repeats could be of use as therapeutics. In the present study, chemical similarity searching was employed to improve the efficacy of pentamidine and Hoechst 33258 ligands that have been shown previously to target the DM1 triplet repeat. A series of in vitro inhibitors of the RNA-protein complex were identified with low micromolar IC(50)'s, which are >20-fold more potent than the query compounds. Importantly, a bis-benzimidazole identified from the Hoechst query improves DM1-associated pre-mRNA splicing defects in cell and mouse models of DM1 (when dosed with 1 mM and 100 mg/kg, respectively). Since Hoechst 33258 was identified as a DM1 binder through analysis of an RNA motif-ligand database, these studies suggest that lead ligands targeting RNA with improved biological activity can be identified by using a synergistic approach that combines analysis of known RNA-ligand interactions with chemical similarity searching.

  9. Pathological and Biochemical Outcomes among African-American and Caucasian Men with Low Risk Prostate Cancer in the SEARCH Database: Implications for Active Surveillance Candidacy.

    PubMed

    Leapman, Michael S; Freedland, Stephen J; Aronson, William J; Kane, Christopher J; Terris, Martha K; Walker, Kelly; Amling, Christopher L; Carroll, Peter R; Cooperberg, Matthew R

    2016-11-01

    Racial disparities in the incidence and risk profile of prostate cancer at diagnosis among African-American men are well reported. However, it remains unclear whether African-American race is independently associated with adverse outcomes in men with clinical low risk disease. We retrospectively analyzed the records of 895 men in the SEARCH (Shared Equal Access Regional Cancer Hospital) database in whom clinical low risk prostate cancer was treated with radical prostatectomy. Associations of African-American and Caucasian race with pathological biochemical recurrence outcomes were examined using chi-square, logistic regression, log rank and Cox proportional hazards analyses. We identified 355 African-American and 540 Caucasian men with low risk tumors in the SEARCH cohort who were followed a median of 6.3 years. Following adjustment for relevant covariates African-American race was not significantly associated with pathological upgrading (OR 1.33, p = 0.12), major upgrading (OR 0.58, p = 0.10), up-staging (OR 1.09, p = 0.73) or positive surgical margins (OR 1.04, p = 0.81). Five-year recurrence-free survival rates were 73.4% in African-American men and 78.4% in Caucasian men (log rank p = 0.18). In a Cox proportional hazards analysis model African-American race was not significantly associated with biochemical recurrence (HR 1.11, p = 0.52). In a cohort of patients at clinical low risk who were treated with prostatectomy in an equal access health system with a high representation of African-American men we observed no significant differences in the rates of pathological upgrading, up-staging or biochemical recurrence. These data support continued use of active surveillance in African-American men. Upgrading and up-staging remain concerning possibilities for all men regardless of race. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  10. A natural history of weight change in men with prostate cancer on androgen-deprivation therapy (ADT): results from the Shared Equal Access Regional Cancer Hospital (SEARCH) database.

    PubMed

    Kim, Howard S; Moreira, Daniel M; Smith, Matthew R; Presti, Joseph C; Aronson, William J; Terris, Martha K; Kane, Christopher J; Amling, Christopher L; Freedland, Stephen J

    2011-03-01

    • To better understand the natural history of weight change with androgen-deprivation therapy (ADT), we investigated the effect of ADT on body weight among men from the Shared Equal Access Regional Cancer Hospital (SEARCH) database. • Men undergoing ADT lose lean muscle but gain fat mass, contributing to an overall gain in weight. • We identified 132 men in SEARCH who received ADT after radical prostatectomy. • 'Weight change' was defined as the difference in weight before starting ADT (6 months before ADT) and the on-ADT weight (between 6 and 18 months after starting ADT). • In a subanalysis, baseline characteristics of weight-gainers and -losers were analysed using univariate and multivariate analysis to test association with weight change. • In all, 92 men (70%) gained weight, and 40 (30%) either lost or maintained a stable weight. • On average, weight on ADT was 2.2 kg higher than the weight before ADT, with the mean change for weight-gainers and -losers being +4.2 kg and -2.4 kg, respectively. • This compared with no significant weight change in the year before starting ADT (paired t-test, change -0.7 kg, P= 0.19) or in the second year on ADT (paired t-test, change -0.5 kg, P= 0.46) for 84 men in whom these additional weight values were recorded. • There was no significant association between any of the features examined and weight change on univariate and multivariate analysis. • In this longitudinal study, ADT was accompanied by significant weight gain (+2.2 kg). This change occurred primarily in the first year of therapy, with men neither losing nor gaining additional weight thereafter. © 2010 THE AUTHORS. BJU INTERNATIONAL © 2010 BJU INTERNATIONAL.

  11. In-Search: This Program Saves Users Time and Money as It Helps Them Wend Their Way Through Dialog's Labyrinthine Databases.

    ERIC Educational Resources Information Center

    Elia, Joseph J., Jr.

    1984-01-01

    Describes the value and use of In-Search, an access software program designed to make online searches simpler--and less costly--for Dialog users by helping them to decide on a search strategy before signing online. (MBR)

  12. Query-Dependent Banding (QDB) for Faster RNA Similarity Searches

    PubMed Central

    Nawrocki, Eric P; Eddy, Sean R

    2007-01-01

    When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs) are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB), which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN 2.4 to LN 1.3 for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization. PMID:17397253

  13. Scopus database: a review.

    PubMed

    Burnham, Judy F

    2006-03-08

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs.

  14. Scopus database: a review

    PubMed Central

    Burnham, Judy F

    2006-01-01

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs. PMID:16522216

  15. Exact and Approximate Probabilistic Symbolic Execution

    NASA Technical Reports Server (NTRS)

    Luckow, Kasper; Pasareanu, Corina S.; Dwyer, Matthew B.; Filieri, Antonio; Visser, Willem

    2014-01-01

    Probabilistic software analysis seeks to quantify the likelihood of reaching a target event under uncertain environments. Recent approaches compute probabilities of execution paths using symbolic execution, but do not support nondeterminism. Nondeterminism arises naturally when no suitable probabilistic model can capture a program behavior, e.g., for multithreading or distributed systems. In this work, we propose a technique, based on symbolic execution, to synthesize schedulers that resolve nondeterminism to maximize the probability of reaching a target event. To scale to large systems, we also introduce approximate algorithms to search for good schedulers, speeding up established random sampling and reinforcement learning results through the quantification of path probabilities based on symbolic execution. We implemented the techniques in Symbolic PathFinder and evaluated them on nondeterministic Java programs. We show that our algorithms significantly improve upon a state-of- the-art statistical model checking algorithm, originally developed for Markov Decision Processes.

  16. Web Search Engines: Search Syntax and Features.

    ERIC Educational Resources Information Center

    Ojala, Marydee

    2002-01-01

    Presents a chart that explains the search syntax, features, and commands used by the 12 most widely used general Web search engines. Discusses Web standardization, expanded types of content searched, size of databases, and search engines that include both simple and advanced versions. (LRW)

  17. Web Search Engines: Search Syntax and Features.

    ERIC Educational Resources Information Center

    Ojala, Marydee

    2002-01-01

    Presents a chart that explains the search syntax, features, and commands used by the 12 most widely used general Web search engines. Discusses Web standardization, expanded types of content searched, size of databases, and search engines that include both simple and advanced versions. (LRW)

  18. Basics of Online Searching.

    ERIC Educational Resources Information Center

    Meadow, Charles T.; Cochrane, Pauline (Atherton)

    Intended to teach the principles of interactive bibliographic searching to those with little or no prior experience, this textbook explains the basic elements of online information retrieval and compares the major database search systems. Its chapters address (1) relevant definitions and vocabulary; (2) the conceptual facets of database searching,…

  19. Conducting a Web Search.

    ERIC Educational Resources Information Center

    Miller-Whitehead, Marie

    Keyword and text string searches of online library catalogs often provide different results according to library and database used and depending upon how books and journals are indexed. For this reason, online databases such as ERIC often provide tutorials and recommendations for searching their site, such as how to use Boolean search strategies.…

  20. Evaluating physicians' probabilistic judgments.

    PubMed

    Poses, R M; Cebul, R D; Centor, R M

    1988-01-01

    Physicians increasingly are challenged to make probabilistic judgments quantitatively. Their ability to make such judgments may be directly linked to the quality of care they provide. Many methods are available to evaluate these judgments. Graphic means of assessment include the calibration curve, covariance graph, and receiver operating characteristic (ROC) curve. Statistical tools can measure the significance of departures from ideal calibration, and measure the area under ROC curve. Modeling the calibration curve using linear or logistic regression provides another method to assess probabilistic judgments, although these may be limited by failure of the data to meet the model's assumptions. Scoring rules provide indices of overall judgmental performance, although their reliability is difficult to gauge for small sample sizes. Decompositions of scoring rules separate judgmental performance into functional components. The authors provide preliminary guidelines for choosing methods for specific research in this area.

  1. Probabilistic authenticated quantum dialogue

    NASA Astrophysics Data System (ADS)

    Hwang, Tzonelih; Luo, Yi-Ping

    2015-12-01

    This work proposes a probabilistic authenticated quantum dialogue (PAQD) based on Bell states with the following notable features. (1) In our proposed scheme, the dialogue is encoded in a probabilistic way, i.e., the same messages can be encoded into different quantum states, whereas in the state-of-the-art authenticated quantum dialogue (AQD), the dialogue is encoded in a deterministic way; (2) the pre-shared secret key between two communicants can be reused without any security loophole; (3) each dialogue in the proposed PAQD can be exchanged within only one-step quantum communication and one-step classical communication. However, in the state-of-the-art AQD protocols, both communicants have to run a QKD protocol for each dialogue and each dialogue requires multiple quantum as well as classical communicational steps; (4) nevertheless, the proposed scheme can resist the man-in-the-middle attack, the modification attack, and even other well-known attacks.

  2. Probabilistic Fatigue: Computational Simulation

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    2002-01-01

    Fatigue is a primary consideration in the design of aerospace structures for long term durability and reliability. There are several types of fatigue that must be considered in the design. These include low cycle, high cycle, combined for different cyclic loading conditions - for example, mechanical, thermal, erosion, etc. The traditional approach to evaluate fatigue has been to conduct many tests in the various service-environment conditions that the component will be subjected to in a specific design. This approach is reasonable and robust for that specific design. However, it is time consuming, costly and needs to be repeated for designs in different operating conditions in general. Recent research has demonstrated that fatigue of structural components/structures can be evaluated by computational simulation based on a novel paradigm. Main features in this novel paradigm are progressive telescoping scale mechanics, progressive scale substructuring and progressive structural fracture, encompassed with probabilistic simulation. These generic features of this approach are to probabilistically telescope scale local material point damage all the way up to the structural component and to probabilistically scale decompose structural loads and boundary conditions all the way down to material point. Additional features include a multifactor interaction model that probabilistically describes material properties evolution, any changes due to various cyclic load and other mutually interacting effects. The objective of the proposed paper is to describe this novel paradigm of computational simulation and present typical fatigue results for structural components. Additionally, advantages, versatility and inclusiveness of computational simulation versus testing are discussed. Guidelines for complementing simulated results with strategic testing are outlined. Typical results are shown for computational simulation of fatigue in metallic composite structures to demonstrate the

  3. Geothermal probabilistic cost study

    NASA Technical Reports Server (NTRS)

    Orren, L. H.; Ziman, G. M.; Jones, S. C.; Lee, T. K.; Noll, R.; Wilde, L.; Sadanand, V.

    1981-01-01

    A tool is presented to quantify the risks of geothermal projects, the Geothermal Probabilistic Cost Model (GPCM). The GPCM model was used to evaluate a geothermal reservoir for a binary-cycle electric plant at Heber, California. Three institutional aspects of the geothermal risk which can shift the risk among different agents was analyzed. The leasing of geothermal land, contracting between the producer and the user of the geothermal heat, and insurance against faulty performance were examined.

  4. Probabilistic Model Development

    NASA Technical Reports Server (NTRS)

    Adam, James H., Jr.

    2010-01-01

    Objective: Develop a Probabilistic Model for the Solar Energetic Particle Environment. Develop a tool to provide a reference solar particle radiation environment that: 1) Will not be exceeded at a user-specified confidence level; 2) Will provide reference environments for: a) Peak flux; b) Event-integrated fluence; and c) Mission-integrated fluence. The reference environments will consist of: a) Elemental energy spectra; b) For protons, helium and heavier ions.

  5. Geothermal probabilistic cost study

    NASA Astrophysics Data System (ADS)

    Orren, L. H.; Ziman, G. M.; Jones, S. C.; Lee, T. K.; Noll, R.; Wilde, L.; Sadanand, V.

    1981-08-01

    A tool is presented to quantify the risks of geothermal projects, the Geothermal Probabilistic Cost Model (GPCM). The GPCM model was used to evaluate a geothermal reservoir for a binary-cycle electric plant at Heber, California. Three institutional aspects of the geothermal risk which can shift the risk among different agents was analyzed. The leasing of geothermal land, contracting between the producer and the user of the geothermal heat, and insurance against faulty performance were examined.

  6. Probabilistic liver atlas construction.

    PubMed

    Dura, Esther; Domingo, Juan; Ayala, Guillermo; Marti-Bonmati, Luis; Goceri, E

    2017-01-13

    Anatomical atlases are 3D volumes or shapes representing an organ or structure of the human body. They contain either the prototypical shape of the object of interest together with other shapes representing its statistical variations (statistical atlas) or a probability map of belonging to the object (probabilistic atlas). Probabilistic atlases are mostly built with simple estimations only involving the data at each spatial location. A new method for probabilistic atlas construction that uses a generalized linear model is proposed. This method aims to improve the estimation of the probability to be covered by the liver. Furthermore, all methods to build an atlas involve previous coregistration of the sample of shapes available. The influence of the geometrical transformation adopted for registration in the quality of the final atlas has not been sufficiently investigated. The ability of an atlas to adapt to a new case is one of the most important quality criteria that should be taken into account. The presented experiments show that some methods for atlas construction are severely affected by the previous coregistration step. We show the good performance of the new approach. Furthermore, results suggest that extremely flexible registration methods are not always beneficial, since they can reduce the variability of the atlas and hence its ability to give sensible values of probability when used as an aid in segmentation of new cases.

  7. Classification Clustering, Probabilistic Information Retrieval, and the Online Catalog.

    ERIC Educational Resources Information Center

    Larson, Ray R.

    1991-01-01

    Discusses problems with subject searches in online library catalogs and examines theoretical principles for the design of effective information retrieval systems. Probabilistic ranking methods are discussed, and an experimental online catalog called CHESHIRE is described. It is noted that CHESHIRE uses classification clustering, provides natural…

  8. Probabilistic Graph Layout for Uncertain Network Visualization.

    PubMed

    Schulz, Christoph; Nocaj, Arlind; Goertler, Jochen; Deussen, Oliver; Brandes, Ulrik; Weiskopf, Daniel

    2017-01-01

    We present a novel uncertain network visualization technique based on node-link diagrams. Nodes expand spatially in our probabilistic graph layout, depending on the underlying probability distributions of edges. The visualization is created by computing a two-dimensional graph embedding that combines samples from the probabilistic graph. A Monte Carlo process is used to decompose a probabilistic graph into its possible instances and to continue with our graph layout technique. Splatting and edge bundling are used to visualize point clouds and network topology. The results provide insights into probability distributions for the entire network-not only for individual nodes and edges. We validate our approach using three data sets that represent a wide range of network types: synthetic data, protein-protein interactions from the STRING database, and travel times extracted from Google Maps. Our approach reveals general limitations of the force-directed layout and allows the user to recognize that some nodes of the graph are at a specific position just by chance.

  9. Number of Unfavorable Intermediate-Risk Factors Predicts Pathologic Upstaging and Prostate Cancer-Specific Mortality Following Radical Prostatectomy: Results From the SEARCH Database.

    PubMed

    Zumsteg, Zachary S; Chen, Zinan; Howard, Lauren E; Amling, Christopher L; Aronson, William J; Cooperberg, Matthew R; Kane, Christopher J; Terris, Martha K; Spratt, Daniel E; Sandler, Howard M; Freedland, Stephen J

    2017-02-01

    To validate and further improve the stratification of intermediate risk prostate cancer into favorable and unfavorable subgroups for patients undergoing radical prostatectomy. The SEARCH database was queried for IR patients undergoing radical prostatectomy without adjuvant radiotherapy. UIR disease was defined any patient with at least one unfavorable risk factor (URF), including primary Gleason pattern 4, 50% of more biopsy cores containing cancer, or multiple National Comprehensive Cancer Network IR factors. One thousand five hundred eighty-six patients with IR prostate cancer comprised the study cohort. Median follow-up was 62 months. Patients classified as UIR were significantly more likely to have pathologic high-risk features, such as Gleason score 8 - 10, pT3-4 disease, or lymph node metastases, than FIR patients (P < 0.001). Furthermore, UIR patients had significantly higher rates of PSA-relapse (PSA, hazard ratio [HR] = 1.89, P < 0.001) and distant metastasis (DM, HR = 2.92, P = 0.001), but no difference in prostate cancer-specific mortality (PCSM) or all-cause mortality in multivariable analysis. On secondary analysis, patients with ≥2 URF had significantly worse PSA-RFS, DM, and PCSM than those with 0 or 1 URF. Moreover, 40% of patients with ≥2 URF had high-risk pathologic features. Patients with UIR prostate cancer are at increased risk of PSA relapse, DM, and pathologic upstaging following prostatectomy. However, increased risk of PCSM was only detected in those with ≥2 URF. This suggests that further refinement of the UIR subgroup may improve risk stratification. Prostate Prostate 77:154-163, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  10. Efficiency of International Classification of Diseases, Ninth Revision, billing code searches to identify emergency department visits for blood or body fluid exposures through a statewide multicenter database.

    PubMed

    Rosen, Lisa M; Liu, Tao; Merchant, Roland C

    2012-06-01

    Blood and body fluid exposures are frequently evaluated in emergency departments (EDs). However, efficient and effective methods for estimating their incidence are not yet established. Evaluate the efficiency and accuracy of estimating statewide ED visits for blood or body fluid exposures using International Classification of Diseases, Ninth Revision (ICD-9), code searches. Secondary analysis of a database of ED visits for blood or body fluid exposure. EDs of 11 civilian hospitals throughout Rhode Island from January 1, 1995, through June 30, 2001. Patients presenting to the ED for possible blood or body fluid exposure were included, as determined by prespecified ICD-9 codes. Positive predictive values (PPVs) were estimated to determine the ability of 10 ICD-9 codes to distinguish ED visits for blood or body fluid exposure from ED visits that were not for blood or body fluid exposure. Recursive partitioning was used to identify an optimal subset of ICD-9 codes for this purpose. Random-effects logistic regression modeling was used to examine variations in ICD-9 coding practices and styles across hospitals. Cluster analysis was used to assess whether the choice of ICD-9 codes was similar across hospitals. The PPV for the original 10 ICD-9 codes was 74.4% (95% confidence interval [CI], 73.2%-75.7%), whereas the recursive partitioning analysis identified a subset of 5 ICD-9 codes with a PPV of 89.9% (95% CI, 88.9%-90.8%) and a misclassification rate of 10.1%. The ability, efficiency, and use of the ICD-9 codes to distinguish types of ED visits varied across hospitals. Although an accurate subset of ICD-9 codes could be identified, variations across hospitals related to hospital coding style, efficiency, and accuracy greatly affected estimates of the number of ED visits for blood or body fluid exposure.

  11. Race and Time from Diagnosis to Radical Prostatectomy: Does Equal Access Mean Equal Timely Access to the Operating Room?—Results from the SEARCH Database

    PubMed Central

    Bañez, Lionel L.; Terris, Martha K.; Aronson, William J.; Presti, Joseph C.; Kane, Christopher J.; Amling, Christopher L.; Freedland, Stephen J.

    2011-01-01

    Background African American men with prostate cancer are at higher risk for cancer-specific death than Caucasian men. We determine whether significant delays in management contribute to this disparity. We hypothesize that in an equal-access health care system, time interval from diagnosis to treatment would not differ by race. Methods We identified 1,532 African American and Caucasian men who underwent radical prostatectomy (RP) from 1988 to 2007 at one of four Veterans Affairs Medical Centers that comprise the Shared Equal-Access Regional Cancer Hospital (SEARCH) database with known biopsy date. We compared time from biopsy to RP between racial groups using linear regression adjusting for demographic and clinical variables. We analyzed risk of potential clinically relevant delays by determining odds of delays >90 and >180 days. Results Median time interval from diagnosis to RP was 76 and 68 days for African Americans and Caucasian men, respectively (P = 0.004). After controlling for demographic and clinical variables, race was not associated with the time interval between diagnosis and RP (P = 0.09). Furthermore, race was not associated with increased risk of delays >90 (P = 0.45) or >180 days (P = 0.31). Conclusions In a cohort of men undergoing RP in an equal-access setting, there was no significant difference between racial groups with regard to time interval from diagnosis to RP. Thus, equal-access includes equal timely access to the operating room. Given our previous finding of poorer outcomes among African Americans, treatment delays do not seem to explain these observations. Our findings need to be confirmed in patients electing other treatment modalities and in other practice settings. PMID:19336564

  12. Obesity is associated with castration-resistant disease and metastasis in men treated with androgen deprivation therapy after radical prostatectomy: results from the SEARCH database

    PubMed Central

    Keto, Christopher J.; Aronson, William J.; Terris, Martha K.; Presti, Joseph C.; Kane, Christopher J.; Amling, Christopher L.; Freedland, Stephen J.

    2012-01-01

    OBJECTIVE To investigate whether obesity predicts poor outcomes in men starting androgen deprivation therapy (ADT) before metastasis, since previous studies found worse outcomes after surgery and radiation for obese men. METHODS A retrospective review was carried out of 287 men in the SEARCH database treated with radical prostatectomy between 1988 and 2009.Body mass index (BMI) was categorized to <25, 25–29.9 and ≥30 kg/m2.Proportional hazards models were used to test the association between BMI and time to castration-resistant prostate cancer (PC), metastases and PC-specific mortality adjusting for demographic and clinicopathological data. RESULTS During a median 73-month follow-up after radical prostatectomy, 403 men (14%) received early ADT.Among 287 men with complete data, median BMI was 28.3 kg/m2.Median follow-up from the start of ADT was 52 months during which 44 men developed castration-resistant PC, 34 developed metastases and 24 died from PC.In multivariate analysis, higher BMI was associated with a trend for greater risk of progression to castration-resistant PC (P = 0.063), a more than threefold increased risk of developing metastases (P = 0.027) and a trend toward worse PC-specific mortality (P = 0.119).Prognostic biomarkers did not differ between BMI groups. CONCLUSIONS Among men treated with early ADT, our results suggest that obese men may have increased risk of PC progression.These data support the general hypothesis that obesity is associated with aggressive PC, although validation of these findings and further study of the mechanisms linking obesity and poor PC outcomes are required. PMID:22094083

  13. Obesity as a predictor of adverse outcome across black and white race: results from the Shared Equal Access Regional Cancer Hospital (SEARCH) Database.

    PubMed

    Jayachandran, Jayakrishnan; Bañez, Lionel L; Aronson, William J; Terris, Martha K; Presti, Joseph C; Amling, Christopher L; Kane, Christopher J; Freedland, Stephen J

    2009-11-15

    Across multiple studies, obesity has been associated with an increased risk of higher grade disease and prostate-specific antigen (PSA) recurrence after radical prostatectomy (RP). Whether these associations vary by race is unknown. In the current study, the authors examined the association between obesity and outcome after RP stratified by race. A retrospective analysis was performed on 1415 men in the Shared Equal Access Regional Cancer Hospital (SEARCH) database who underwent RP between 1989 and 2008. The association between increased body mass index (BMI) and adverse pathology and biochemical recurrence was examined using multivariate logistic regression and Cox models, respectively. Data were examined stratified by race. After adjusting for preoperative clinical characteristics, higher BMI was associated with higher tumor grade (P = .008) and positive surgical margins (P < .001) in white men, and similar but statistically nonsignificant trends were observed in black men. No significant interaction was noted between race and BMI for associations with adverse pathology (P(interaction)> or =.12). After adjusting for preoperative clinical characteristics, higher BMI was associated with an increased risk of recurrence in both white men (P = .001) and black men (P = .03). After further adjusting for pathologic variables, higher BMI was associated with significantly increased risk of recurrence in white men (P = .002) and black men (P = .01). No significant interactions were observed between race and BMI for predicting biochemical progression adjusting either for preoperative factors (P(interaction) = .35) or for preoperative and pathologic features (P(interaction) = .47). Obesity was associated with a greater risk of recurrence among both black men and white men. Obesity did not appear to be more or less influential in 1 race than another but, rather, was identified as a risk factor for aggressive cancer regardless of race.

  14. Race and time from diagnosis to radical prostatectomy: does equal access mean equal timely access to the operating room?--Results from the SEARCH database.

    PubMed

    Bañez, Lionel L; Terris, Martha K; Aronson, William J; Presti, Joseph C; Kane, Christopher J; Amling, Christopher L; Freedland, Stephen J

    2009-04-01

    African American men with prostate cancer are at higher risk for cancer-specific death than Caucasian men. We determine whether significant delays in management contribute to this disparity. We hypothesize that in an equal-access health care system, time interval from diagnosis to treatment would not differ by race. We identified 1,532 African American and Caucasian men who underwent radical prostatectomy (RP) from 1988 to 2007 at one of four Veterans Affairs Medical Centers that comprise the Shared Equal-Access Regional Cancer Hospital (SEARCH) database with known biopsy date. We compared time from biopsy to RP between racial groups using linear regression adjusting for demographic and clinical variables. We analyzed risk of potential clinically relevant delays by determining odds of delays >90 and >180 days. Median time interval from diagnosis to RP was 76 and 68 days for African Americans and Caucasian men, respectively (P = 0.004). After controlling for demographic and clinical variables, race was not associated with the time interval between diagnosis and RP (P = 0.09). Furthermore, race was not associated with increased risk of delays >90 (P = 0.45) or >180 days (P = 0.31). In a cohort of men undergoing RP in an equal-access setting, there was no significant difference between racial groups with regard to time interval from diagnosis to RP. Thus, equal-access includes equal timely access to the operating room. Given our previous finding of poorer outcomes among African Americans, treatment delays do not seem to explain these observations. Our findings need to be confirmed in patients electing other treatment modalities and in other practice settings.

  15. Probabilistic analysis of fires in nuclear plants

    SciTech Connect

    Unione, A.; Teichmann, T.

    1985-01-01

    The aim of this paper is to describe a multilevel (i.e., staged) probabilistic analysis of fire risks in nuclear plants (as part of a general PRA) which maximizes the benefits of the FRA (fire risk assessment) in a cost effective way. The approach uses several stages of screening, physical modeling of clearly dominant risk contributors, searches for direct (e.g., equipment dependences) and secondary (e.g., fire induced internal flooding) interactions, and relies on lessons learned and available data from and surrogate FRAs. The general methodology is outlined. 6 figs., 10 tabs.

  16. Multiclient Identification System Using Adaptive Probabilistic Model

    NASA Astrophysics Data System (ADS)

    Lin, Chin-Teng; Siana, Linda; Shou, Yu-Wen; Yang, Chien-Ting

    2010-12-01

    This paper aims at integrating detection and identification of human faces in a more practical and real-time face recognition system. The proposed face detection system is based on the cascade Adaboost method to improve the precision and robustness toward unstable surrounding lightings. Our Adaboost method innovates to adjust the environmental lighting conditions by histogram lighting normalization and to accurately locate the face regions by a region-based-clustering process as well. We also address on the problem of multi-scale faces in this paper by using 12 different scales of searching windows and 5 different orientations for each client in pursuit of the multi-view independent face identification. There are majorly two methodological parts in our face identification system, including PCA (principal component analysis) facial feature extraction and adaptive probabilistic model (APM). The structure of our implemented APM with a weighted combination of simple probabilistic functions constructs the likelihood functions by the probabilistic constraint in the similarity measures. In addition, our proposed method can online add a new client and update the information of registered clients due to the constructed APM. The experimental results eventually show the superior performance of our proposed system for both offline and real-time online testing.

  17. Searching Sociological Abstracts.

    ERIC Educational Resources Information Center

    Kerbel, Sandra Sandor

    1981-01-01

    Describes the scope, content, and retrieval characteristics of Sociological Abstracts, an online database of literature in the social sciences. Sample searches are displayed, and the strengths and weaknesses of the database are summarized. (FM)

  18. PCAT: Probabilistic Cataloger

    NASA Astrophysics Data System (ADS)

    Daylan, Tansu; Portillo, K. N. Stephen; Finkbeiner, Douglas P.

    2017-05-01

    PCAT (Probabilistic Cataloger) samples from the posterior distribution of a metamodel, i.e., union of models with different dimensionality, to compare the models. This is achieved via transdimensional proposals such as births, deaths, splits and merges in addition to the within-model proposals. This method avoids noisy estimates of the Bayesian evidence that may not reliably distinguish models when sampling from the posterior probability distribution of each model. The code has been applied in two different subfields of astronomy: high energy photometry, where transdimensional elements are gamma-ray point sources; and strong lensing, where light-deflecting dark matter subhalos take the role of transdimensional elements.

  19. Atomic Spectra Database (ASD)

    National Institute of Standards and Technology Data Gateway

    SRD 78 NIST Atomic Spectra Database (ASD) (Web, free access)   This database provides access and search capability for NIST critically evaluated data on atomic energy levels, wavelengths, and transition probabilities that are reasonably up-to-date. The NIST Atomic Spectroscopy Data Center has carried out these critical compilations.

  20. Online Patent Searching: The Realities.

    ERIC Educational Resources Information Center

    Kaback, Stuart M.

    1983-01-01

    Considers patent subject searching capabilities of major online databases, noting patent claims, "deep-indexed" files, test searches, retrieval of related references, multi-database searching, improvements needed in indexing of chemical structures, full text searching, improvements needed in handling numerical data, and augmenting a…