Science.gov

Sample records for quantum database search

  1. Quantum search of a real unstructured database

    NASA Astrophysics Data System (ADS)

    Broda, Bogusław

    2016-02-01

    A simple circuit implementation of the oracle for Grover's quantum search of a real unstructured classical database is proposed. The oracle contains a kind of quantumly accessible classical memory, which stores the database.

  2. An efficient quantum search engine on unsorted database

    NASA Astrophysics Data System (ADS)

    Lu, Songfeng; Zhang, Yingyu; Liu, Fang

    2013-10-01

    We consider the problem of finding one or more desired items out of an unsorted database. Patel has shown that if the database permits quantum queries, then mere digitization is sufficient for efficient search for one desired item. The algorithm, called factorized quantum search algorithm, presented by him can locate the desired item in an unsorted database using O() queries to factorized oracles. But the algorithm requires that all the attribute values must be distinct from each other. In this paper, we discuss how to make a database satisfy the requirements, and present a quantum search engine based on the algorithm. Our goal is achieved by introducing auxiliary files for the attribute values that are not distinct, and converting every complex query request into a sequence of calls to factorized quantum search algorithm. The query complexity of our algorithm is O() for most cases.

  3. Unambiguous identification of coherent states: Searching a quantum database

    SciTech Connect

    Sedlak, Michal; Ziman, Mario; Pribyla, Ondrej; Buzek, Vladimir; Hillery, Mark

    2007-08-15

    We consider an unambiguous identification of an unknown coherent state with one of two unknown coherent reference states. Specifically, we consider two modes of an electromagnetic field prepared in unknown coherent states vertical bar {alpha}{sub 1}> and vertical bar {alpha}{sub 2}>, respectively. The third mode is prepared either in the state vertical bar {alpha}{sub 1}> or in the state vertical bar {alpha}{sub 2}>. The task is to identify (unambiguously) which of the two modes are in the same state. We present a scheme consisting of three beam splitters capable to perform this task. Although we do not prove the optimality, we show that the performance of the proposed setup is better than the generalization of the optimal measurement known for a finite-dimensional case. We show that a single beam splitter is capable to perform an unambiguous quantum state comparison for coherent states optimally. Finally, we propose an experimental setup consisting of 2N-1 beam splitters for unambiguous identification among N unknown coherent states. This setup can be considered as a search in a quantum database. The elements of the database are unknown coherent states encoded in different modes of an electromagnetic field. The task is to specify the two modes that are excited in the same, though unknown, coherent state.

  4. Online Database Searching Workbook.

    ERIC Educational Resources Information Center

    Littlejohn, Alice C.; Parker, Joan M.

    Designed primarily for use by first-time searchers, this workbook provides an overview of online searching. Following a brief introduction which defines online searching, databases, and database producers, five steps in carrying out a successful search are described: (1) identifying the main concepts of the search statement; (2) selecting a…

  5. Online Database Searching Workbook.

    ERIC Educational Resources Information Center

    Littlejohn, Alice C.; Parker, Joan M.

    Designed primarily for use by first-time searchers, this workbook provides an overview of online searching. Following a brief introduction which defines online searching, databases, and database producers, five steps in carrying out a successful search are described: (1) identifying the main concepts of the search statement; (2) selecting a…

  6. Database Searching by Managers.

    ERIC Educational Resources Information Center

    Arnold, Stephen E.

    Managers and executives need the easy and quick access to business and management information that online databases can provide, but many have difficulty articulating their search needs to an intermediary. One possible solution would be to encourage managers and their immediate support staff members to search textual databases directly as they now…

  7. Quantum Search and Beyond

    DTIC Science & Technology

    2008-07-02

    solution of certain problems for which the communication needs do not dominate. A similar situation prevails in the quantum world. Quantum teleportation and...REPORT Quantum Search and Beyond 14. ABSTRACT 16. SECURITY CLASSIFICATION OF: Ten years ago, the quantum search algorithm was designed to provide a way...P.O. Box 12211 Research Triangle Park, NC 27709-2211 15. SUBJECT TERMS quantum searching - partial quantum searching, fixed-point quantum

  8. Paleomagnetic database search possible

    NASA Astrophysics Data System (ADS)

    Harbert, William

    I have recently finished an on-line search program which allows remote users to search the “Abase” ASCII version of the World Paleomagnetic Database developed by Lock and McElhinny [1991]. The program is very simple to use and will search the Soviet, non-Soviet, rock unit, and reference databases and create output files that can be downloaded back to a researcher's local system using the ftp command.To use Search, telnet to 130.49.3.1 (earth.eps.pitt.edu) and login as the user “Search.rdquo There is no password, and the user is asked a series of questions, which define the geographic region and ages of interest. The program will also ask for an identifier with which to create the output file names. The program has three modes of operation: text-only, Tektronix graphics, or X11l/R5 graphics; the proper choice depends on the computer hardware that is used by the searcher.

  9. Alternative Databases for Anthropology Searching.

    ERIC Educational Resources Information Center

    Brody, Fern; Lambert, Maureen

    1984-01-01

    Examines online search results of sample questions in several databases covering linguistics, cultural anthropology, and physical anthropology in order to determine if and where any overlap in results might occur, and which files have greatest number of relevant hits. Search results by database are given for each subject area. (EJS)

  10. Quantum Search in Hilbert Space

    NASA Technical Reports Server (NTRS)

    Zak, Michail

    2003-01-01

    A proposed quantum-computing algorithm would perform a search for an item of information in a database stored in a Hilbert-space memory structure. The algorithm is intended to make it possible to search relatively quickly through a large database under conditions in which available computing resources would otherwise be considered inadequate to perform such a task. The algorithm would apply, more specifically, to a relational database in which information would be stored in a set of N complex orthonormal vectors, each of N dimensions (where N can be exponentially large). Each vector would constitute one row of a unitary matrix, from which one would derive the Hamiltonian operator (and hence the evolutionary operator) of a quantum system. In other words, all the stored information would be mapped onto a unitary operator acting on a quantum state that would represent the item of information to be retrieved. Then one could exploit quantum parallelism: one could pose all search queries simultaneously by performing a quantum measurement on the system. In so doing, one would effectively solve the search problem in one computational step. One could exploit the direct- and inner-product decomposability of the unitary matrix to make the dimensionality of the memory space exponentially large by use of only linear resources. However, inasmuch as the necessary preprocessing (the mapping of the stored information into a Hilbert space) could be exponentially expensive, the proposed algorithm would likely be most beneficial in applications in which the resources available for preprocessing were much greater than those available for searching.

  11. Critically damped quantum search.

    PubMed

    Mizel, Ari

    2009-04-17

    Although measurement and unitary processes can accomplish any quantum evolution in principle, thinking in terms of dissipation and damping can be powerful. We propose a modification of Grover's algorithm in which the idea of damping plays a natural role. Remarkably, we find that there is a critical damping value that divides between the quantum O(sqrt[N]) and classical O(N) search regimes. In addition, by allowing the damping to vary in a fashion we describe, one obtains a fixed-point quantum search algorithm in which ignorance of the number of targets increases the number of oracle queries only by a factor of 1.5.

  12. Relativistic quantum private database queries

    NASA Astrophysics Data System (ADS)

    Sun, Si-Jia; Yang, Yu-Guang; Zhang, Ming-Ou

    2015-04-01

    Recently, Jakobi et al. (Phys Rev A 83, 022301, 2011) suggested the first practical private database query protocol (J-protocol) based on the Scarani et al. (Phys Rev Lett 92, 057901, 2004) quantum key distribution protocol. Unfortunately, the J-protocol is just a cheat-sensitive private database query protocol. In this paper, we present an idealized relativistic quantum private database query protocol based on Minkowski causality and the properties of quantum information. Also, we prove that the protocol is secure in terms of the user security and the database security.

  13. Fixed-point quantum search.

    PubMed

    Grover, Lov K

    2005-10-07

    The quantum search algorithm consists of an iterative sequence of selective inversions and diffusion type operations, as a result of which it is able to find a state with desired properties (target state) in an unsorted database of size N in only sqrt[N] queries. This is achieved by designing the iterative transformations in a way that each iteration results in a small rotation of the state vector in a two-dimensional Hilbert space that includes the target state; if we choose the right number of iterative steps, we will stop just at the target state. This Letter shows that by replacing the selective inversions by selective phase shifts of pi/3, the algorithm preferentially converges to the target state irrespective of the step size or number of iterations. This feature leads to robust search algorithms and also to new schemes for quantum control and error correction.

  14. Quantum Error Correction Protects Quantum Search Algorithms Against Decoherence

    PubMed Central

    Botsinis, Panagiotis; Babar, Zunaira; Alanis, Dimitrios; Chandra, Daryus; Nguyen, Hung; Ng, Soon Xin; Hanzo, Lajos

    2016-01-01

    When quantum computing becomes a wide-spread commercial reality, Quantum Search Algorithms (QSA) and especially Grover’s QSA will inevitably be one of their main applications, constituting their cornerstone. Most of the literature assumes that the quantum circuits are free from decoherence. Practically, decoherence will remain unavoidable as is the Gaussian noise of classic circuits imposed by the Brownian motion of electrons, hence it may have to be mitigated. In this contribution, we investigate the effect of quantum noise on the performance of QSAs, in terms of their success probability as a function of the database size to be searched, when decoherence is modelled by depolarizing channels’ deleterious effects imposed on the quantum gates. Moreover, we employ quantum error correction codes for limiting the effects of quantum noise and for correcting quantum flips. More specifically, we demonstrate that, when we search for a single solution in a database having 4096 entries using Grover’s QSA at an aggressive depolarizing probability of 10−3, the success probability of the search is 0.22 when no quantum coding is used, which is improved to 0.96 when Steane’s quantum error correction code is employed. Finally, apart from Steane’s code, the employment of Quantum Bose-Chaudhuri-Hocquenghem (QBCH) codes is also considered. PMID:27924865

  15. Quantum Error Correction Protects Quantum Search Algorithms Against Decoherence

    NASA Astrophysics Data System (ADS)

    Botsinis, Panagiotis; Babar, Zunaira; Alanis, Dimitrios; Chandra, Daryus; Nguyen, Hung; Ng, Soon Xin; Hanzo, Lajos

    2016-12-01

    When quantum computing becomes a wide-spread commercial reality, Quantum Search Algorithms (QSA) and especially Grover’s QSA will inevitably be one of their main applications, constituting their cornerstone. Most of the literature assumes that the quantum circuits are free from decoherence. Practically, decoherence will remain unavoidable as is the Gaussian noise of classic circuits imposed by the Brownian motion of electrons, hence it may have to be mitigated. In this contribution, we investigate the effect of quantum noise on the performance of QSAs, in terms of their success probability as a function of the database size to be searched, when decoherence is modelled by depolarizing channels’ deleterious effects imposed on the quantum gates. Moreover, we employ quantum error correction codes for limiting the effects of quantum noise and for correcting quantum flips. More specifically, we demonstrate that, when we search for a single solution in a database having 4096 entries using Grover’s QSA at an aggressive depolarizing probability of 10‑3, the success probability of the search is 0.22 when no quantum coding is used, which is improved to 0.96 when Steane’s quantum error correction code is employed. Finally, apart from Steane’s code, the employment of Quantum Bose-Chaudhuri-Hocquenghem (QBCH) codes is also considered.

  16. Quantum Error Correction Protects Quantum Search Algorithms Against Decoherence.

    PubMed

    Botsinis, Panagiotis; Babar, Zunaira; Alanis, Dimitrios; Chandra, Daryus; Nguyen, Hung; Ng, Soon Xin; Hanzo, Lajos

    2016-12-07

    When quantum computing becomes a wide-spread commercial reality, Quantum Search Algorithms (QSA) and especially Grover's QSA will inevitably be one of their main applications, constituting their cornerstone. Most of the literature assumes that the quantum circuits are free from decoherence. Practically, decoherence will remain unavoidable as is the Gaussian noise of classic circuits imposed by the Brownian motion of electrons, hence it may have to be mitigated. In this contribution, we investigate the effect of quantum noise on the performance of QSAs, in terms of their success probability as a function of the database size to be searched, when decoherence is modelled by depolarizing channels' deleterious effects imposed on the quantum gates. Moreover, we employ quantum error correction codes for limiting the effects of quantum noise and for correcting quantum flips. More specifically, we demonstrate that, when we search for a single solution in a database having 4096 entries using Grover's QSA at an aggressive depolarizing probability of 10(-3), the success probability of the search is 0.22 when no quantum coding is used, which is improved to 0.96 when Steane's quantum error correction code is employed. Finally, apart from Steane's code, the employment of Quantum Bose-Chaudhuri-Hocquenghem (QBCH) codes is also considered.

  17. Begin: Online Database Searching Now!

    ERIC Educational Resources Information Center

    Lodish, Erica K.

    1986-01-01

    Because of the increasing importance of online databases, school library media specialists are encouraged to introduce students to online searching. Four books that would help media specialists gain a basic background are reviewed and it is noted that although they are very technical, they can be adapted to individual needs. (EM)

  18. Quantum random-walk search algorithm

    SciTech Connect

    Shenvi, Neil; Whaley, K. Birgitta; Kempe, Julia

    2003-05-01

    Quantum random walks on graphs have been shown to display many interesting properties, including exponentially fast hitting times when compared with their classical counterparts. However, it is still unclear how to use these novel properties to gain an algorithmic speedup over classical algorithms. In this paper, we present a quantum search algorithm based on the quantum random-walk architecture that provides such a speedup. It will be shown that this algorithm performs an oracle search on a database of N items with O({radical}(N)) calls to the oracle, yielding a speedup similar to other quantum search algorithms. It appears that the quantum random-walk formulation has considerable flexibility, presenting interesting opportunities for development of other, possibly novel quantum algorithms.

  19. Searching NCBI Databases Using Entrez.

    PubMed

    Gibney, Gretchen; Baxevanis, Andreas D

    2011-10-01

    One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two basic protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An alternate protocol builds upon the first basic protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The support protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed. © 2011 by John Wiley & Sons, Inc.

  20. Searching NCBI databases using Entrez.

    PubMed

    Baxevanis, Andreas D

    2008-12-01

    One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed. Copyright 2008 by John Wiley & Sons, Inc.

  1. Searching NCBI databases using Entrez.

    PubMed

    Gibney, Gretchen; Baxevanis, Andreas D

    2011-06-01

    One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two basic protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An alternate protocol builds upon the first basic protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The support protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  2. Nonlinear (quantum) search

    NASA Astrophysics Data System (ADS)

    Meyer, David

    2015-03-01

    Farhi and Gutmann's ``analog analogue'' of Grover's algorithm is simply the Schrödinger equation for the evolution of a particle hopping among N sites, one of which is marked by the presence of a potential well. When the particle is initialized in a state with equal amplitude at each site, after O (N 1 / 2) time its amplitude is concentrated at the marked site, and a measurement will detect it there with probability 1. A nonlinear Schrödinger equation with a cubic nonlinear term arises as the Gross-Pitaevskii equation approximately describing the collective evolution of the many quantum particles in a Bose-Einstein condensate (BEC), a novel--but experimentally observed--form of matter in which all the particles are in the same quantum state. Including such a nonlinear term into the continuous time evolution of the particle hopping among N sites, one of which is marked, constitutes a nonlinear (quantum) search algorithm. If the relative strength of the nonlinear term varies correctly with time, the state concentrates at the marked site at time π / 2 for any N. This is a constant time algorithm--immensely faster than O (N 1 / 2) . The state concentrates at the marked site for shorter and shorter times as N --> ∞ , however, which means the measurement time must be determined with increasing precision. Accounting correctly for the physical resources necessary to measure time sufficiently precisely, the total resources for this algorithm scale as O (N 1 / 2) , no better than Farhi and Gutmann's linear quantum algorithm. But jointly optimizing these resource requirements results in an overall scaling of N 1 / 4. This is a significant, but not unreasonable, improvement over the N 1 / 2 scaling of the linear algorithm. Since the Gross-Pitaevskii equation approximates the multi-particle (linear) Schrödinger equation, for which Grover's algorithm is optimal, this gives a quantum information-theoretic lower bound on the number of particles needed for the approximation

  3. Database Search Engines: Paradigms, Challenges and Solutions.

    PubMed

    Verheggen, Kenneth; Martens, Lennart; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    The first step in identifying proteins from mass spectrometry based shotgun proteomics data is to infer peptides from tandem mass spectra, a task generally achieved using database search engines. In this chapter, the basic principles of database search engines are introduced with a focus on open source software, and the use of database search engines is demonstrated using the freely available SearchGUI interface. This chapter also discusses how to tackle general issues related to sequence database searching and shows how to minimize their impact.

  4. Library Instruction and Online Database Searching.

    ERIC Educational Resources Information Center

    Mercado, Heidi

    1999-01-01

    Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…

  5. Library Instruction and Online Database Searching.

    ERIC Educational Resources Information Center

    Mercado, Heidi

    1999-01-01

    Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…

  6. Online Database Searching in Smaller Public Libraries.

    ERIC Educational Resources Information Center

    Roose, Tina

    1983-01-01

    Online database searching experiences of nine Illinois public libraries--Arlington Heights, Deerfield, Elk Grove Village, Evanston, Glenview, Northbrook, Schaumburg Township, Waukegan, Wilmette--are discussed, noting search costs, user charges, popular databases, library acquisition, interaction with users, and staff training. Three sources are…

  7. Online Database Searching in Smaller Public Libraries.

    ERIC Educational Resources Information Center

    Roose, Tina

    1983-01-01

    Online database searching experiences of nine Illinois public libraries--Arlington Heights, Deerfield, Elk Grove Village, Evanston, Glenview, Northbrook, Schaumburg Township, Waukegan, Wilmette--are discussed, noting search costs, user charges, popular databases, library acquisition, interaction with users, and staff training. Three sources are…

  8. Hybrid quantum computing: semicloning for general database retrieval

    NASA Astrophysics Data System (ADS)

    Lanzagorta, Marco; Uhlmann, Jeffrey K.

    2005-05-01

    Quantum computing (QC) has become an important area of research in computer science because of its potential to provide more efficient algorithmic solutions to certain problems than are possible with classical computing (CC). In particular, QC is able to exploit the special properties of quantum superposition to achieve computational parallelism beyond what can be achieved with parallel CC computers. However, these special properties are not applicable for general computation. Therefore, we propose the use of "hybrid quantum computers" (HQCs) that combine both classical and quantum computing architectures in order to leverage the benefits of both. We demonstrate how an HQC can exploit quantum search to support general database operations more efficiently than is possible with CC. Our solution is based on new quantum results that are of independent significance to the field of quantum computing. More specifically, we demonstrate that the most restrictive implications of the quantum No-Cloning Theorem can be avoided through the use of semiclones.

  9. Using volume holograms to search digital databases

    NASA Astrophysics Data System (ADS)

    Burr, Geoffrey W.; Maltezos, George; Grawert, Felix; Kobras, Sebastian; Hanssen, Holger; Coufal, Hans J.

    2002-01-01

    Holographic data storage offers the potential for simultaneous search of an entire database by performing multiple optical correlations between stored data pages and a search argument. This content-addressable retrieval produces one analog correlation score for each stored volume hologram. We have previously developed fuzzy encoding techniques for this fast parallel search, and holographically searched a small database with high fidelity. We recently showed that such systems can be configured to produce true inner-products, and proposed an architecture in which massively-parallel searches could be implemented. However, the speed advantage over conventional electronic search provided by parallelism brings with it the possibility of erroneous search results, since these analog correlation scores are subject to various noise sources. We show that the fidelity of such an optical search depends not only on the usual holographic storage signal-to-noise factors (such as readout power, diffraction efficiency, and readout speed), but also on the particular database query being made. In effect, the presence of non-matching database records with nearly the same correlation score as the targeted matching records reduces the speed advantage of the parallel search. Thus for any given fidelity target, the performance improvement offered by a content-addressable holographic storage can vary from query to query even within the same database.

  10. Parametric Quantum Search Algorithm as Quantum Walk: A Quantum Simulation

    NASA Astrophysics Data System (ADS)

    Ellinas, Demosthenes; Konstandakis, Christos

    2016-02-01

    Parametric quantum search algorithm (PQSA) is a form of quantum search that results by relaxing the unitarity of the original algorithm. PQSA can naturally be cast in the form of quantum walk, by means of the formalism of oracle algebra. This is due to the fact that the completely positive trace preserving search map used by PQSA, admits a unitarization (unitary dilation) a la quantum walk, at the expense of introducing auxiliary quantum coin-qubit space. The ensuing QW describes a process of spiral motion, chosen to be driven by two unitary Kraus generators, generating planar rotations of Bloch vector around an axis. The quadratic acceleration of quantum search translates into an equivalent quadratic saving of the number of coin qubits in the QW analogue. The associated to QW model Hamiltonian operator is obtained and is shown to represent a multi-particle long-range interacting quantum system that simulates parametric search. Finally, the relation of PQSA-QW simulator to the QW search algorithm is elucidated.

  11. Interactive searching of facial image databases

    NASA Astrophysics Data System (ADS)

    Nicholls, Robert A.; Shepherd, John W.; Shepherd, Jean

    1995-09-01

    A set of psychological facial descriptors has been devised to enable computerized searching of criminal photograph albums. The descriptors have been used to encode image databased of up to twelve thousand images. Using a system called FACES, the databases are searched by translating a witness' verbal description into corresponding facial descriptors. Trials of FACES have shown that this coding scheme is more productive and efficient than searching traditional photograph albums. An alternative method of searching the encoded database using a genetic algorithm is currenly being tested. The genetic search method does not require the witness to verbalize a description of the target but merely to indicate a degree of similarity between the target and a limited selection of images from the database. The major drawback of FACES is that is requires a manual encoding of images. Research is being undertaken to automate the process, however, it will require an algorithm which can predict human descriptive values. Alternatives to human derived coding schemes exist using statistical classifications of images. Since databases encoded using statistical classifiers do not have an obvious direct mapping to human derived descriptors, a search method which does not require the entry of human descriptors is required. A genetic search algorithm is being tested for such a purpose.

  12. Stationary states in quantum walk search

    NASA Astrophysics Data System (ADS)

    PrÅ«sis, Krišjānis; Vihrovs, Jevgěnijs; Wong, Thomas G.

    2016-09-01

    When classically searching a database, having additional correct answers makes the search easier. For a discrete-time quantum walk searching a graph for a marked vertex, however, additional marked vertices can make the search harder by causing the system to approximately begin in a stationary state, so the system fails to evolve. In this paper, we completely characterize the stationary states, or 1-eigenvectors, of the quantum walk search operator for general graphs and configurations of marked vertices by decomposing their amplitudes into uniform and flip states. This infinitely expands the number of known stationary states and gives an optimization procedure to find the stationary state closest to the initial uniform state of the walk. We further prove theorems on the existence of stationary states, with them conditionally existing if the marked vertices form a bipartite connected component and always existing if nonbipartite. These results utilize the standard oracle in Grover's algorithm, but we show that a different type of oracle prevents stationary states from interfering with the search algorithm.

  13. Fast Structural Search in Phylogenetic Databases

    PubMed Central

    Wang, Jason T. L.; Shan, Huiyuan; Shasha, Dennis; Piel, William H.

    2007-01-01

    As the size of phylogenetic databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. We propose structural search techniques that, given a query or pattern tree P and a database of phylogenies D, find trees in D that are sufficiently close to P. The “closeness” is a measure of the topological relationships in P that are found to be the same or similar in a tree D in D. We develop a filtering technique that accelerates searches and present algorithms for rooted and unrooted trees where the trees can be weighted or unweighted. Experimental results on comparing the similarity measure with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate that the proposed approach is promising. PMID:19325851

  14. Searching gene and protein sequence databases.

    PubMed

    Barsalou, T; Brutlag, D L

    1991-01-01

    A large-scale effort to map and sequence the human genome is now under way. Crucial to the success of this research is a group of computer programs that analyze and compare data on molecular sequences. This article describes the classic algorithms for similarity searching and sequence alignment. Because good performance of these algorithms is critical to searching very large and growing databases, we analyze the running times of the algorithms and discuss recent improvements in this area.

  15. Efficient search and retrieval in biometric databases

    NASA Astrophysics Data System (ADS)

    Mhatre, Amit J.; Palla, Srinivas; Chikkerur, Sharat; Govindaraju, Venu

    2005-03-01

    Biometric identification has emerged as a reliable means of controlling access to both physical and virtual spaces. Fingerprints, face and voice biometrics are being increasingly used as alternatives to passwords, PINs and visual verification. In spite of the rapid proliferation of large-scale databases, the research has thus far been focused only on accuracy within small databases. In larger applications, response time and retrieval efficiency also become important in addition to accuracy. Unlike structured information such as text or numeric data that can be sorted, biometric data does not have any natural sorting order. Therefore indexing and binning of biometric databases represents a challenging problem. We present results using parallel combination of multiple biometrics to bin the database. Using hand geometry and signature features we show that the search space can be reduced to just 5% of the entire database.

  16. Multi-Database Searching in Forensic Psychology.

    ERIC Educational Resources Information Center

    Piotrowski, Chris; Perdue, Robert W.

    Traditional library skills have been augmented since the introduction of online computerized database services. Because of the complexity of the field, forensic psychology can benefit enormously from the application of comprehensive bibliographic search strategies. The study reported here demonstrated the bibliographic results obtained when a…

  17. Searching Online Database Services over the Internet.

    ERIC Educational Resources Information Center

    Keays, Thomas

    1993-01-01

    Describes how to use the Internet to access commercial online database services, such as DIALOG, and discusses the advantages in terms of costs, reference services, and accessibility. Outlines in detail how to save a search session or link another terminal to a Telnet session, and provides information and Internet addresses for eight vendor…

  18. Multi-Database Searching in Forensic Psychology.

    ERIC Educational Resources Information Center

    Piotrowski, Chris; Perdue, Robert W.

    Traditional library skills have been augmented since the introduction of online computerized database services. Because of the complexity of the field, forensic psychology can benefit enormously from the application of comprehensive bibliographic search strategies. The study reported here demonstrated the bibliographic results obtained when a…

  19. Searching Across the International Space Station Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; McDermott, William J.; Smith, Ernest E.; Bell, David G.; Gurram, Mohana

    2007-01-01

    Data access in the enterprise generally requires us to combine data from different sources and different formats. It is advantageous thus to focus on the intersection of the knowledge across sources and domains; keeping irrelevant knowledge around only serves to make the integration more unwieldy and more complicated than necessary. A context search over multiple domain is proposed in this paper to use context sensitive queries to support disciplined manipulation of domain knowledge resources. The objective of a context search is to provide the capability for interrogating many domain knowledge resources, which are largely semantically disjoint. The search supports formally the tasks of selecting, combining, extending, specializing, and modifying components from a diverse set of domains. This paper demonstrates a new paradigm in composition of information for enterprise applications. In particular, it discusses an approach to achieving data integration across multiple sources, in a manner that does not require heavy investment in database and middleware maintenance. This lean approach to integration leads to cost-effectiveness and scalability of data integration with an underlying schemaless object-relational database management system. This highly scalable, information on demand system framework, called NX-Search, which is an implementation of an information system built on NETMARK. NETMARK is a flexible, high-throughput open database integration framework for managing, storing, and searching unstructured or semi-structured arbitrary XML and HTML used widely at the National Aeronautics Space Administration (NASA) and industry.

  20. Searching the NCBI databases using Entrez.

    PubMed

    Baxevanis, Andreas D

    2006-11-01

    One of the most widely-used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently-issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  1. Searching the NCBI databases using Entrez.

    PubMed

    Baxevanis, Andreas D

    2006-03-01

    One of the most widely-used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently-issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  2. Audio stream classification for multimedia database search

    NASA Astrophysics Data System (ADS)

    Artese, M.; Bianco, S.; Gagliardi, I.; Gasparini, F.

    2013-03-01

    Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.

  3. Multiparty Quantum Key Agreement Based on Quantum Search Algorithm

    NASA Astrophysics Data System (ADS)

    Cao, Hao; Ma, Wenping

    2017-03-01

    Quantum key agreement is an important topic that the shared key must be negotiated equally by all participants, and any nontrivial subset of participants cannot fully determine the shared key. To date, the embed modes of subkey in all the previously proposed quantum key agreement protocols are based on either BB84 or entangled states. The research of the quantum key agreement protocol based on quantum search algorithms is still blank. In this paper, on the basis of investigating the properties of quantum search algorithms, we propose the first quantum key agreement protocol whose embed mode of subkey is based on a quantum search algorithm known as Grover’s algorithm. A novel example of protocols with 5 – party is presented. The efficiency analysis shows that our protocol is prior to existing MQKA protocols. Furthermore it is secure against both external attack and internal attacks.

  4. Multiparty Quantum Key Agreement Based on Quantum Search Algorithm

    PubMed Central

    Cao, Hao; Ma, Wenping

    2017-01-01

    Quantum key agreement is an important topic that the shared key must be negotiated equally by all participants, and any nontrivial subset of participants cannot fully determine the shared key. To date, the embed modes of subkey in all the previously proposed quantum key agreement protocols are based on either BB84 or entangled states. The research of the quantum key agreement protocol based on quantum search algorithms is still blank. In this paper, on the basis of investigating the properties of quantum search algorithms, we propose the first quantum key agreement protocol whose embed mode of subkey is based on a quantum search algorithm known as Grover’s algorithm. A novel example of protocols with 5 – party is presented. The efficiency analysis shows that our protocol is prior to existing MQKA protocols. Furthermore it is secure against both external attack and internal attacks. PMID:28332610

  5. Multiparty Quantum Key Agreement Based on Quantum Search Algorithm.

    PubMed

    Cao, Hao; Ma, Wenping

    2017-03-23

    Quantum key agreement is an important topic that the shared key must be negotiated equally by all participants, and any nontrivial subset of participants cannot fully determine the shared key. To date, the embed modes of subkey in all the previously proposed quantum key agreement protocols are based on either BB84 or entangled states. The research of the quantum key agreement protocol based on quantum search algorithms is still blank. In this paper, on the basis of investigating the properties of quantum search algorithms, we propose the first quantum key agreement protocol whose embed mode of subkey is based on a quantum search algorithm known as Grover's algorithm. A novel example of protocols with 5 - party is presented. The efficiency analysis shows that our protocol is prior to existing MQKA protocols. Furthermore it is secure against both external attack and internal attacks.

  6. Quantum Private Comparison Based on Quantum Search Algorithm

    NASA Astrophysics Data System (ADS)

    Zhang, Wei-Wei; Li, Dan; Song, Ting-Ting; Li, Yan-Bing

    2013-05-01

    We propose two quantum private comparison protocols based on quantum search algorithm with the help of a semi-honest third party. Our protocols utilize the properties of quantum search algorithm, the unitary operations, and the single-particle measurements. The security of our protocols is discussed with respect to both the outsider attack and the participant attack. There is no information leaked about the private information and the comparison result, even the third party cannot know these information.

  7. Novel strategy for database searching in spin liouville space by NMR ensemble computing

    PubMed

    Bruschweiler

    2000-11-27

    Quantum computing by nuclear magnetic resonance using pseudopure spin states is bound by the maximal speed of quantum computing algorithms operating on pure states. In contrast to these quantum computing algorithms, a novel algorithm for searching an unsorted database is presented here that operates on truly mixed states in spin Liouville space. It provides an exponential speedup over Grover's quantum search algorithm with the sensitivity scaling exponentially with the number of spins, as for pseudopure state implementations. The minimal decoherence time required is exponentially shorter than that for Grover's algorithm.

  8. Two Quantum Direct Communication Protocols Based on Quantum Search Algorithm

    NASA Astrophysics Data System (ADS)

    Xu, Shu-Jiang; Chen, Xiu-Bo; Wang, Lian-Hai; Niu, Xin-Xin; Yang, Yi-Xian

    2015-07-01

    Based on the properties of two-qubit Grover's quantum search algorithm, we propose two quantum direct communication protocols, including a deterministic secure quantum communication and a quantum secure direct communication protocol. Secret messages can be directly sent from the sender to the receiver by using two-qubit unitary operations and the single photon measurement with one of the proposed protocols. Theoretical analysis shows that the security of the proposed protocols can be highly ensured.

  9. Subject Index to Databases Available from Computer Search Service.

    ERIC Educational Resources Information Center

    Atkinson, Steven D., Comp.; Knee, Michael, Comp.

    The University Libraries Computer Search Service (State University of New York at Albany, SUNYA) provides access to databases from many vendors including BRS, Dialog, Wilsonline, CA Search, and Westlaw. Members of the Computer Search Service, Collection Development, and Reference Service staffs select vendor services and new databases for their…

  10. WAIS Searching of the Current Contents Database

    NASA Astrophysics Data System (ADS)

    Banholzer, P.; Grabenstein, M. E.

    The Homer E. Newell Memorial Library of NASA's Goddard Space Flight Center is developing capabilities to permit Goddard personnel to access electronic resources of the Library via the Internet. The Library's support services contractor, Maxima Corporation, and their subcontractor, SANAD Support Technologies have recently developed a World Wide Web Home Page (http://www-library.gsfc.nasa.gov) to provide the primary means of access. The first searchable database to be made available through the HomePage to Goddard employees is Current Contents, from the Institute for Scientific Information (ISI). The initial implementation includes coverage of articles from the last few months of 1992 to present. These records are augmented with abstracts and references, and often are more robust than equivalent records in bibliographic databases that currently serve the astronomical community. Maxima/SANAD selected Wais Incorporated's WAIS product with which to build the interface to Current Contents. This system allows access from Macintosh, IBM PC, and Unix hosts, which is an important feature for Goddard's multiplatform environment. The forms interface is structured to allow both fielded (author, article title, journal name, id number, keyword, subject term, and citation) and unfielded WAIS searches. The system allows a user to: Retrieve individual journal article records. Retrieve Table of Contents of specific issues of journals. Connect to articles with similar subject terms or keywords. Connect to other issues of the same journal in the same year. Browse journal issues from an alphabetical list of indexed journal names.

  11. Search for New Quantum Algorithms

    DTIC Science & Technology

    2006-05-01

    37] Bernstein, Ethan, and Umesh Vazirani, Quantum Complexity Theory, SIAM Journal of Computing, Vol. 26, No. 5, (1997), 1411-1473. [38] Biham...polynomial, Commun. Math. Phys., 121, (1989), 351-399. [74] Vazirani, Umesh , On the power of quantum computation, Philosophical Tranactions of the...Royal Society of London, Series A, 354:1759-1768, August 1998. [75] Vazirani, Umesh , A survey of quantum complexity theory, AMS PSAPM/58, (2002

  12. Modernizing quantum annealing using local searches

    NASA Astrophysics Data System (ADS)

    Chancellor, Nicholas

    2017-02-01

    I describe how real quantum annealers may be used to perform local (in state space) searches around specified states, rather than the global searches traditionally implemented in the quantum annealing algorithm (QAA). Such protocols will have numerous advantages over simple quantum annealing. By using such searches the effect of problem mis-specification can be reduced, as only energy differences between the searched states will be relevant. The QAA is an analogue of simulated annealing, a classical numerical technique which has now been superseded. Hence, I explore two strategies to use an annealer in a way which takes advantage of modern classical optimization algorithms. Specifically, I show how sequential calls to quantum annealers can be used to construct analogues of population annealing and parallel tempering which use quantum searches as subroutines. The techniques given here can be applied not only to optimization, but also to sampling. I examine the feasibility of these protocols on real devices and note that implementing such protocols should require minimal if any change to the current design of the flux qubit-based annealers by D-Wave Systems Inc. I further provide proof-of-principle numerical experiments based on quantum Monte Carlo that demonstrate simple examples of the discussed techniques.

  13. Quantum walk search through potential barriers

    NASA Astrophysics Data System (ADS)

    Wong, Thomas G.

    2016-12-01

    An ideal quantum walk transitions from one vertex to another with perfect fidelity, but in physical systems, the particle may be hindered by potential energy barriers. Then the particle has some amplitude of tunneling through the barriers, and some amplitude of staying put. We investigate the algorithmic consequence of such barriers for the quantum walk formulation of Grover’s algorithm. We prove that the failure amplitude must scale as O(1/\\sqrt{N}) for search to retain its quantum O(\\sqrt{N}) runtime; otherwise, it searches in classical O(N) time. Thus searching larger ‘databases’ requires increasingly reliable hop operations or error correction. This condition holds for both discrete- and continuous-time quantum walks.

  14. Adiabatic Quantum Search in Open Systems

    NASA Astrophysics Data System (ADS)

    Wild, Dominik S.; Gopalakrishnan, Sarang; Knap, Michael; Yao, Norman Y.; Lukin, Mikhail D.

    2016-10-01

    Adiabatic quantum algorithms represent a promising approach to universal quantum computation. In isolated systems, a key limitation to such algorithms is the presence of avoided level crossings, where gaps become extremely small. In open quantum systems, the fundamental robustness of adiabatic algorithms remains unresolved. Here, we study the dynamics near an avoided level crossing associated with the adiabatic quantum search algorithm, when the system is coupled to a generic environment. At zero temperature, we find that the algorithm remains scalable provided the noise spectral density of the environment decays sufficiently fast at low frequencies. By contrast, higher order scattering processes render the algorithm inefficient at any finite temperature regardless of the spectral density, implying that no quantum speedup can be achieved. Extensions and implications for other adiabatic quantum algorithms will be discussed.

  15. Adiabatic Quantum Search in Open Systems.

    PubMed

    Wild, Dominik S; Gopalakrishnan, Sarang; Knap, Michael; Yao, Norman Y; Lukin, Mikhail D

    2016-10-07

    Adiabatic quantum algorithms represent a promising approach to universal quantum computation. In isolated systems, a key limitation to such algorithms is the presence of avoided level crossings, where gaps become extremely small. In open quantum systems, the fundamental robustness of adiabatic algorithms remains unresolved. Here, we study the dynamics near an avoided level crossing associated with the adiabatic quantum search algorithm, when the system is coupled to a generic environment. At zero temperature, we find that the algorithm remains scalable provided the noise spectral density of the environment decays sufficiently fast at low frequencies. By contrast, higher order scattering processes render the algorithm inefficient at any finite temperature regardless of the spectral density, implying that no quantum speedup can be achieved. Extensions and implications for other adiabatic quantum algorithms will be discussed.

  16. Experimental implementation of a quantum random-walk search algorithm using strongly dipolar coupled spins

    NASA Astrophysics Data System (ADS)

    Lu, Dawei; Zhu, Jing; Zou, Ping; Peng, Xinhua; Yu, Yihua; Zhang, Shanmin; Chen, Qun; Du, Jiangfeng

    2010-02-01

    An important quantum search algorithm based on the quantum random walk performs an oracle search on a database of N items with O(phN) calls, yielding a speedup similar to the Grover quantum search algorithm. The algorithm was implemented on a quantum information processor of three-qubit liquid-crystal nuclear magnetic resonance (NMR) in the case of finding 1 out of 4, and the diagonal elements’ tomography of all the final density matrices was completed with comprehensible one-dimensional NMR spectra. The experimental results agree well with the theoretical predictions.

  17. Experimental implementation of a quantum random-walk search algorithm using strongly dipolar coupled spins

    SciTech Connect

    Lu Dawei; Peng Xinhua; Du Jiangfeng; Zhu Jing; Zou Ping; Yu Yihua; Zhang Shanmin; Chen Qun

    2010-02-15

    An important quantum search algorithm based on the quantum random walk performs an oracle search on a database of N items with O({radical}(phN)) calls, yielding a speedup similar to the Grover quantum search algorithm. The algorithm was implemented on a quantum information processor of three-qubit liquid-crystal nuclear magnetic resonance (NMR) in the case of finding 1 out of 4, and the diagonal elements' tomography of all the final density matrices was completed with comprehensible one-dimensional NMR spectra. The experimental results agree well with the theoretical predictions.

  18. Quantum search algorithms on a regular lattice

    SciTech Connect

    Hein, Birgit; Tanner, Gregor

    2010-07-15

    Quantum algorithms for searching for one or more marked items on a d-dimensional lattice provide an extension of Grover's search algorithm including a spatial component. We demonstrate that these lattice search algorithms can be viewed in terms of the level dynamics near an avoided crossing of a one-parameter family of quantum random walks. We give approximations for both the level splitting at the avoided crossing and the effectively two-dimensional subspace of the full Hilbert space spanning the level crossing. This makes it possible to give the leading order behavior for the search time and the localization probability in the limit of large lattice size including the leading order coefficients. For d=2 and d=3, these coefficients are calculated explicitly. Closed form expressions are given for higher dimensions.

  19. Multiple Database Searching: Techniques and Pitfalls

    ERIC Educational Resources Information Center

    Hawkins, Donald T.

    1978-01-01

    Problems involved in searching multiple data bases are discussed including indexing differences, overlap among data bases, variant spellings, and elimination of duplicate items from search output. Discussion focuses on CA Condensates, Inspec, and Metadex data bases. (J PF)

  20. The Database Dilemma: Online Search Strategies in Nursing.

    ERIC Educational Resources Information Center

    Fried, Ava K.; And Others

    1989-01-01

    Describes a study that compared the coverage of the nursing profession, subject heading specificity, and ease of retrieval of the MEDLINE and Nursing & Allied Health (CINAHL) online databases. The strengths and weaknesses of each database are discussed and hints for searching on both databases are provided. (four references) (CLB)

  1. The Database Dilemma: Online Search Strategies in Nursing.

    ERIC Educational Resources Information Center

    Fried, Ava K.; And Others

    1989-01-01

    Describes a study that compared the coverage of the nursing profession, subject heading specificity, and ease of retrieval of the MEDLINE and Nursing & Allied Health (CINAHL) online databases. The strengths and weaknesses of each database are discussed and hints for searching on both databases are provided. (four references) (CLB)

  2. Searching the ASRS Database Using QUORUM Keyword Search, Phrase Search, Phrase Generation, and Phrase Discovery

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W.; Connors, Mary M. (Technical Monitor)

    2001-01-01

    To support Search Requests and Quick Responses at the Aviation Safety Reporting System (ASRS), four new QUORUM methods have been developed: keyword search, phrase search, phrase generation, and phrase discovery. These methods build upon the core QUORUM methods of text analysis, modeling, and relevance-ranking. QUORUM keyword search retrieves ASRS incident narratives that contain one or more user-specified keywords in typical or selected contexts, and ranks the narratives on their relevance to the keywords in context. QUORUM phrase search retrieves narratives that contain one or more user-specified phrases, and ranks the narratives on their relevance to the phrases. QUORUM phrase generation produces a list of phrases from the ASRS database that contain a user-specified word or phrase. QUORUM phrase discovery finds phrases that are related to topics of interest. Phrase generation and phrase discovery are particularly useful for finding query phrases for input to QUORUM phrase search. The presentation of the new QUORUM methods includes: a brief review of the underlying core QUORUM methods; an overview of the new methods; numerous, concrete examples of ASRS database searches using the new methods; discussion of related methods; and, in the appendices, detailed descriptions of the new methods.

  3. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  4. Quantum search simulation with Wolfram Mathematica

    NASA Astrophysics Data System (ADS)

    Prokopenya, Alexander N.

    2016-09-01

    We consider Grover's algorithm of quantum search for one or several integers out of N = 2n, where n is a number of quantum bits in the memory register. There is a black-box or subroutine containing information about hidden integers and it can easily recognize these integers but we do not know which ones out of N they are. To find the hidden items we can do no better with a classical computer than to apply the subroutine repeatedly to all possible integers until we hit on the special one and in the worst case we have to repeat this procedure N times. We have analyzed the Grover algorithm carefully and showed that it enables to speed up this search quadratically, although its realization requires to know a number of hidden items. The lower bound for the probability of successful solving the search problem has been obtained. The validity of the results was demonstrated by simulation of the Grover search algorithm using the package QuantumCircuit written in the Wolfram Mathematica language.

  5. Searching the PASCAL database - A user's perspective

    NASA Technical Reports Server (NTRS)

    Jack, Robert F.

    1989-01-01

    The operation of PASCAL, a bibliographic data base covering broad subject areas in science and technology, is discussed. The data base includes information from about 1973 to the present, including topics in engineering, chemistry, physics, earth science, environmental science, biology, psychology, and medicine. Data from 1986 to the present may be searched using DIALOG. The procedures and classification codes for searching PASCAL are presented. Examples of citations retrieved from the data base are given and suggestions are made concerning when to use PASCAL.

  6. Searching the PASCAL database - A user's perspective

    NASA Technical Reports Server (NTRS)

    Jack, Robert F.

    1989-01-01

    The operation of PASCAL, a bibliographic data base covering broad subject areas in science and technology, is discussed. The data base includes information from about 1973 to the present, including topics in engineering, chemistry, physics, earth science, environmental science, biology, psychology, and medicine. Data from 1986 to the present may be searched using DIALOG. The procedures and classification codes for searching PASCAL are presented. Examples of citations retrieved from the data base are given and suggestions are made concerning when to use PASCAL.

  7. Fixed-point adiabatic quantum search

    NASA Astrophysics Data System (ADS)

    Dalzell, Alexander M.; Yoder, Theodore J.; Chuang, Isaac L.

    2017-01-01

    Fixed-point quantum search algorithms succeed at finding one of M target items among N total items even when the run time of the algorithm is longer than necessary. While the famous Grover's algorithm can search quadratically faster than a classical computer, it lacks the fixed-point property—the fraction of target items must be known precisely to know when to terminate the algorithm. Recently, Yoder, Low, and Chuang [Phys. Rev. Lett. 113, 210501 (2014), 10.1103/PhysRevLett.113.210501] gave an optimal gate-model search algorithm with the fixed-point property. Previously, it had been discovered by Roland and Cerf [Phys. Rev. A 65, 042308 (2002), 10.1103/PhysRevA.65.042308] that an adiabatic quantum algorithm, operating by continuously varying a Hamiltonian, can reproduce the quadratic speedup of gate-model Grover search. We ask, can an adiabatic algorithm also reproduce the fixed-point property? We show that the answer depends on what interpolation schedule is used, so as in the gate model, there are both fixed-point and non-fixed-point versions of adiabatic search, only some of which attain the quadratic quantum speedup. Guided by geometric intuition on the Bloch sphere, we rigorously justify our claims with an explicit upper bound on the error in the adiabatic approximation. We also show that the fixed-point adiabatic search algorithm can be simulated in the gate model with neither loss of the quadratic Grover speedup nor of the fixed-point property. Finally, we discuss natural uses of fixed-point algorithms such as preparation of a relatively prime state and oblivious amplitude amplification.

  8. [Online tutorial for searching a dental database].

    PubMed

    Liem, S L

    2009-05-01

    With millions of resources available on the Internet, it is still difficult to search for appropriate and relevant information, even with the use of advanced search engines. With no systematic quality control of online resources, it is difficult to determine how reliable information is. The consortium Intute, which administers a databank of high quality information available via the Internet, which is intended to support scientific teaching and research, ensures that all information provided has been evaluated and investigated by its own team of specialists in various disciplines. A part of the website of Intute which is accessible free of charge is the Virtual Training Suite, by means of which one can improve one's competence in Internet searching and where a number of reliable and qualitatively superior sources for daily practice are available.

  9. Is Library Database Searching a Language Learning Activity?

    ERIC Educational Resources Information Center

    Bordonaro, Karen

    2010-01-01

    This study explores how non-native speakers of English think of words to enter into library databases when they begin the process of searching for information in English. At issue is whether or not language learning takes place when these students use library databases. Language learning in this study refers to the use of strategies employed by…

  10. Chemical Substructure Searching: Comparing Three Commercially Available Databases.

    ERIC Educational Resources Information Center

    Wagner, A. Ben

    1986-01-01

    Compares the differences in coverage and utility of three substructure databases--Chemical Abstracts, Index Chemicus, and Chemical Information System's Nomenclature Search System. The differences between Chemical Abstracts with two different vendors--STN International and Questel--are described and a summary guide for choosing between databases is…

  11. Chemical Substructure Searching: Comparing Three Commercially Available Databases.

    ERIC Educational Resources Information Center

    Wagner, A. Ben

    1986-01-01

    Compares the differences in coverage and utility of three substructure databases--Chemical Abstracts, Index Chemicus, and Chemical Information System's Nomenclature Search System. The differences between Chemical Abstracts with two different vendors--STN International and Questel--are described and a summary guide for choosing between databases is…

  12. Is Library Database Searching a Language Learning Activity?

    ERIC Educational Resources Information Center

    Bordonaro, Karen

    2010-01-01

    This study explores how non-native speakers of English think of words to enter into library databases when they begin the process of searching for information in English. At issue is whether or not language learning takes place when these students use library databases. Language learning in this study refers to the use of strategies employed by…

  13. Exceptional quantum walk search on the cycle

    NASA Astrophysics Data System (ADS)

    Wong, Thomas G.; Santos, Raqueline A. M.

    2017-06-01

    Quantum walks are standard tools for searching graphs for marked vertices, and they often yield quadratic speedups over a classical random walk's hitting time. In some exceptional cases, however, the system only evolves by sign flips, staying in a uniform probability distribution for all time. We prove that the one-dimensional periodic lattice or cycle with any arrangement of marked vertices is such an exceptional configuration. Using this discovery, we construct a search problem where the quantum walk's random sampling yields an arbitrary speedup in query complexity over the classical random walk's hitting time. In this context, however, the mixing time to prepare the initial uniform state is a more suitable comparison than the hitting time, and then, the speedup is roughly quadratic.

  14. Coherence depletion in the Grover quantum search algorithm

    NASA Astrophysics Data System (ADS)

    Shi, Hai-Long; Liu, Si-Yuan; Wang, Xiao-Hui; Yang, Wen-Li; Yang, Zhan-Ying; Fan, Heng

    2017-03-01

    We investigate the role of quantum coherence depletion (QCD) in the Grover search algorithm (GA) by using several typical measures of quantum coherence and quantum correlations. By using the relative entropy of coherence measure (Cr), we show that the success probability depends on the QCD. The same phenomenon is also found by using the l1 norm of coherence measure (Cl1).In the limit case, the cost performance is defined to characterize the behavior about QCD in enhancing the success probability of GA, which is only related to the number of searcher items and the scale of the database, regardless of using Cr or Cl 1. In the generalized Grover search algorithm (GGA), the QCD for a class of states increases with the required optimal measurement time. In comparison, the quantification of other quantum correlations in GA, such as pairwise entanglement, multipartite entanglement, pairwise discord, and genuine multipartite discord, cannot be directly related to the success probability or the optimal measurement time. Additionally, we do not detect pairwise nonlocality or genuine tripartite nonlocality in GA since Clauser-Horne-Shimony-Holt inequality and Svetlichny's inequality are not violated.

  15. Private database queries based on counterfactual quantum key distribution

    NASA Astrophysics Data System (ADS)

    Zhang, Jia-Li; Guo, Fen-Zhuo; Gao, Fei; Liu, Bin; Wen, Qiao-Yan

    2013-08-01

    Based on the fundamental concept of quantum counterfactuality, we propose a protocol to achieve quantum private database queries, which is a theoretical study of how counterfactuality can be employed beyond counterfactual quantum key distribution (QKD). By adding crucial detecting apparatus to the device of QKD, the privacy of both the distrustful user and the database owner can be guaranteed. Furthermore, the proposed private-database-query protocol makes full use of the low efficiency in the counterfactual QKD, and by adjusting the relevant parameters, the protocol obtains excellent flexibility and extensibility.

  16. Grover search with lackadaisical quantum walks

    NASA Astrophysics Data System (ADS)

    Wong, Thomas G.

    2015-10-01

    The lazy random walk, where the walker has some probability of staying put, is a useful tool in classical algorithms. We propose a quantum analogue, the lackadaisical quantum walk, where each vertex is given l self-loops, and we investigate its effects on Grover’s algorithm when formulated as search for a marked vertex on the complete graph of N vertices. For the discrete-time quantum walk using the phase flip coin, adding a self-loop to each vertex boosts the success probability from 1/2 to 1. Additional self-loops, however, decrease the success probability. Using instead the Shenvi, Kempe, and Whaley (2003) coin, adding self-loops simply slows down the search. These coins also differ in that the first is faster than classical when l scales less than N, while the second requires that l scale less than N 2. Finally, continuous-time quantum walks differ from both of these discrete-time examples—the self-loops make no difference at all. These behaviors generalize to multiple marked vertices.

  17. A comprehensive and scalable database search system for metaproteomics.

    PubMed

    Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W

    2016-08-16

    Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for

  18. Exhaustive Database Searching for Amino Acid Mutations in Proteomes

    SciTech Connect

    Hyatt, Philip Douglas; Pan, Chongle

    2012-01-01

    Amino acid mutations in proteins can be found by searching tandem mass spectra acquired in shotgun proteomics experiments against protein sequences predicted from genomes. Traditionally, unconstrained searches for amino acid mutations have been accomplished by using a sequence tagging approach that combines de novo sequencing with database searching. However, this approach is limited by the performance of de novo sequencing. The Sipros algorithm v2.0 was developed to perform unconstrained database searching using high-resolution tandem mass spectra by exhaustively enumerating all single non-isobaric mutations for every residue in a protein database. The performance of Sipros for amino acid mutation identification exceeded that of an established sequence tagging algorithm, Inspect, based on benchmarking results from a Rhodopseudomonas palustris proteomics dataset. To demonstrate the viability of the algorithm for meta-proteomics, Sipros was used to identify amino acid mutations in a natural microbial community in acid mine drainage.

  19. BioCarian: search engine for exploratory searches in heterogeneous biological databases.

    PubMed

    Zaki, Nazar; Tennakoon, Chandana

    2017-10-02

    There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search

  20. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics

    PubMed Central

    Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.

    2016-01-01

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the

  1. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

    PubMed

    Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L

    2016-11-04

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the

  2. Searching Harvard Business Review Online. . . Lessons in Searching a Full Text Database.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1985-01-01

    This article examines the Harvard Business Review Online (HBRO) database (bibliographic description fields, abstracts, extracted information, full text, subject descriptors) and reports on 31 sample HBRO searches conducted in Bibliographic Retrieval Services to test differences between searching full text and searching bibliographic record. Sample…

  3. Searching Harvard Business Review Online. . . Lessons in Searching a Full Text Database.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1985-01-01

    This article examines the Harvard Business Review Online (HBRO) database (bibliographic description fields, abstracts, extracted information, full text, subject descriptors) and reports on 31 sample HBRO searches conducted in Bibliographic Retrieval Services to test differences between searching full text and searching bibliographic record. Sample…

  4. The Effects of Search Tool Type and Cognitive Style on Performance during Hypermedia Database Searches.

    ERIC Educational Resources Information Center

    Leader, Lars F.; Klein, James D.

    1996-01-01

    Describes a study that investigated the effects of search tools and learner cognitive styles on performance in searches for information within a hypermedia database. Students in a university English-as-a-Second-Language program were assigned to one of four treatment groups, and results show a significant interaction between search tool and…

  5. Should we search Chinese biomedical databases when performing systematic reviews?

    PubMed

    Cohen, Jérémie F; Korevaar, Daniël A; Wang, Junfeng; Spijker, René; Bossuyt, Patrick M

    2015-03-06

    Chinese biomedical databases contain a large number of publications available to systematic reviewers, but it is unclear whether they are used for synthesizing the available evidence. We report a case of two systematic reviews on the accuracy of anti-cyclic citrullinated peptide for diagnosing rheumatoid arthritis. In one of these, the authors did not search Chinese databases; in the other, they did. We additionally assessed the extent to which Cochrane reviewers have searched Chinese databases in a systematic overview of the Cochrane Library (inception to 2014). The two diagnostic reviews included a total of 269 unique studies, but only 4 studies were included in both reviews. The first review included five studies published in the Chinese language (out of 151) while the second included 114 (out of 118). The summary accuracy estimates from the two reviews were comparable. Only 243 of the published 8,680 Cochrane reviews (less than 3%) searched one or more of the five major Chinese databases. These Chinese databases index about 2,500 journals, of which less than 6% are also indexed in MEDLINE. All 243 Cochrane reviews evaluated an intervention, 179 (74%) had at least one author with a Chinese affiliation; 118 (49%) addressed a topic in complementary or alternative medicine. Although searching Chinese databases may lead to the identification of a large amount of additional clinical evidence, Cochrane reviewers have rarely included them in their search strategy. We encourage future initiatives to evaluate more systematically the relevance of searching Chinese databases, as well as collaborative efforts to allow better incorporation of Chinese resources in systematic reviews.

  6. Assigning statistical significance to proteotypic peptides via database searches.

    PubMed

    Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

    2011-02-01

    Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId's knowledge database to include proteotypic information, utilized RAId's statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId's programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those that occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. Published by Elsevier B.V.

  7. Privacy-preserving search for chemical compound databases

    PubMed Central

    2015-01-01

    Background Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information. PMID:26678650

  8. Scalable quantum search using trapped ions

    SciTech Connect

    Ivanov, S. S.; Ivanov, P. A.; Linington, I. E.; Vitanov, N. V.

    2010-04-15

    We propose a scalable implementation of Grover's quantum search algorithm in a trapped-ion quantum information processor. The system is initialized in an entangled Dicke state by using adiabatic techniques. The inversion-about-average and oracle operators take the form of single off-resonant laser pulses. This is made possible by utilizing the physical symmetries of the trapped-ion linear crystal. The physical realization of the algorithm represents a dramatic simplification: each logical iteration (oracle and inversion about average) requires only two physical interaction steps, in contrast to the large number of concatenated gates required by previous approaches. This not only facilitates the implementation but also increases the overall fidelity of the algorithm.

  9. Forensic utilization of familial searches in DNA databases.

    PubMed

    Gershaw, Cassandra J; Schweighardt, Andrew J; Rourke, Linda C; Wallace, Margaret M

    2011-01-01

    DNA evidence is widely recognized as an invaluable tool in the process of investigation and identification, as well as one of the most sought after types of evidence for presentation to a jury. In the United States, the development of state and federal DNA databases has greatly impacted the forensic community by creating an efficient, searchable system that can be used to eliminate or include suspects in an investigation based on matching DNA profiles - the profile already in the database to the profile of the unknown sample in evidence. Recent changes in legislation have begun to allow for the possibility to expand the parameters of DNA database searches, taking into account the possibility of familial searches. This article discusses prospective positive outcomes of utilizing familial DNA searches and acknowledges potential negative outcomes, thereby presenting both sides of this very complicated, rapidly evolving situation.

  10. Complementary use of the SciSearch database for improved biomedical information searching.

    PubMed Central

    Brown, C M

    1998-01-01

    The use of at least two complementary online biomedical databases is generally considered critical for biomedical scientists seeking to keep fully abreast of recent research developments as well as to retrieve the highest number of relevant citations possible. Although the National Library of Medicine's MEDLINE is usually the database of choice, this paper illustrates the benefits of using another database, the Institute for Scientific Information's SciSearch, when conducting a biomedical information search. When a simple query about red wine consumption and coronary artery disease was posed simultaneously in both MEDLINE and SciSearch, a greater number of relevant citations were retrieved through SciSearch. This paper also provides suggestions for carrying out a comprehensive biomedical literature search in a rapid and efficient manner by using SciSearch in conjunction with MEDLINE. PMID:9549014

  11. The LAILAPS search engine: relevance ranking in life science databases.

    PubMed

    Lange, Matthias; Spies, Karl; Bargsten, Joachim; Haberhauer, Gregor; Klapperstück, Matthias; Leps, Michael; Weinel, Christian; Wünschiers, Röbbe; Weissbach, Mandy; Stein, Jens; Scholz, Uwe

    2010-01-15

    Search engines and retrieval systems are popular tools at a life science desktop. The manual inspection of hundreds of database entries, that reflect a life science concept or fact, is a time intensive daily work. Hereby, not the number of query results matters, but the relevance does. In this paper, we present the LAILAPS search engine for life science databases. The concept is to combine a novel feature model for relevance ranking, a machine learning approach to model user relevance profiles, ranking improvement by user feedback tracking and an intuitive and slim web user interface, that estimates relevance rank by tracking user interactions. Queries are formulated as simple keyword lists and will be expanded by synonyms. Supporting a flexible text index and a simple data import format, LAILAPS can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. With a set of features, extracted from each database hit in combination with user relevance preferences, a neural network predicts user specific relevance scores. Using expert knowledge as training data for a predefined neural network or using users own relevance training sets, a reliable relevance ranking of database hits has been implemented. In this paper, we present the LAILAPS system, the concepts, benchmarks and use cases. LAILAPS is public available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.

  12. Molecule database framework: a framework for creating database applications with chemical structure search capability.

    PubMed

    Kiener, Joos

    2013-12-11

    Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes:•Support for multi-component compounds (mixtures)•Import and export of SD-files•Optional security (authorization)For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures).Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. By using a simple web application it was shown that Molecule Database Framework

  13. Molecule database framework: a framework for creating database applications with chemical structure search capability

    PubMed Central

    2013-01-01

    Background Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Results Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes: • Support for multi-component compounds (mixtures) • Import and export of SD-files • Optional security (authorization) For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. Conclusions By using a simple web application it was

  14. Generalized Jaynes-Cummings model as a quantum search algorithm

    SciTech Connect

    Romanelli, A.

    2009-07-15

    We propose a continuous time quantum search algorithm using a generalization of the Jaynes-Cummings model. In this model the states of the atom are the elements among which the algorithm realizes the search, exciting resonances between the initial and the searched states. This algorithm behaves like Grover's algorithm; the optimal search time is proportional to the square root of the size of the search set and the probability to find the searched state oscillates periodically in time. In this frame, it is possible to reinterpret the usual Jaynes-Cummings model as a trivial case of the quantum search algorithm.

  15. A practical approach for inexpensive searches of radiology report databases.

    PubMed

    Desjardins, Benoit; Hamilton, R Curtis

    2007-06-01

    We present a method to perform full text searches of radiology reports for the large number of departments that do not have this ability as part of their radiology or hospital information system. A tool written in Microsoft Access (front-end) has been designed to search a server (back-end) containing the indexed backup weekly copy of the full relational database extracted from a radiology information system (RIS). This front end-/back-end approach has been implemented in a large academic radiology department, and is used for teaching, research and administrative purposes. The weekly second backup of the 80 GB, 4 million record RIS database takes 2 hours. Further indexing of the exported radiology reports takes 6 hours. Individual searches of the indexed database typically take less than 1 minute on the indexed database and 30-60 minutes on the nonindexed database. Guidelines to properly address privacy and institutional review board issues are closely followed by all users. This method has potential to improve teaching, research, and administrative programs within radiology departments that cannot afford more expensive technology.

  16. SSAHA: A Fast Search Method for Large DNA Databases

    PubMed Central

    Ning, Zemin; Cox, Anthony J.; Mullikin, James C.

    2001-01-01

    We describe an algorithm, SSAHA (Sequence Search and Alignment by Hashing Algorithm), for performing fast searches on databases containing multiple gigabases of DNA. Sequences in the database are preprocessed by breaking them into consecutive k-tuples of k contiguous bases and then using a hash table to store the position of each occurrence of each k-tuple. Searching for a query sequence in the database is done by obtaining from the hash table the “hits” for each k-tuple in the query sequence and then performing a sort on the results. We discuss the effect of the tuple length k on the search speed, memory usage, and sensitivity of the algorithm and present the results of computational experiments which show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA, while requiring less memory than suffix tree methods. The SSAHA algorithm is used for high-throughput single nucleotide polymorphism (SNP) detection and very large scale sequence assembly. Also, it provides Web-based sequence search facilities for Ensembl projects. PMID:11591649

  17. Ontology searching and browsing at the Rat Genome Database

    PubMed Central

    Laulederkind, Stanley J. F.; Tutaj, Marek; Shimoyama, Mary; Hayman, G. Thomas; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Wang, Shur-Jen; de Pons, Jeff; Dwinell, Melinda R.; Jacob, Howard J.

    2012-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40 000 rat gene records, as well as human and mouse orthologs, 1857 rat and 1912 human quantitative trait loci (QTLs) and 2347 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. RGD uses more than a dozen different ontologies to standardize annotation information for genes, QTLs and strains. That means a lot of time can be spent searching and browsing ontologies for the appropriate terms needed both for curating and mining the data. RGD has upgraded its ontology term search to make it more versatile and more robust. A term search result is connected to a term browser so the user can fine-tune the search by viewing parent and children terms. Most publicly available term browsers display a hierarchical organization of terms in an expandable tree format. RGD has replaced its old tree browser format with a ‘driller’ type of browser that allows quicker drilling up and down through the term branches, which has been confirmed by testing. The RGD ontology report pages have also been upgraded. Expanded functionality allows more choice in how annotations are displayed and what subsets of annotations are displayed. The new ontology search, browser and report features have been designed to enhance both manual data curation and manual data extraction. Database URL: http://rgd.mcw.edu/rgdweb/ontology/search.html PMID:22434847

  18. Protein Database Searches Using Compositionally Adjusted Substitution Matrices

    PubMed Central

    Altschul, Stephen F.; Wootton, John C.; Gertz, E. Michael; Agarwala, Richa; Morgulis, Aleksandr; Schäffer, Alejandro A.; Yu, Yi-Kuo

    2005-01-01

    Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long-standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches, in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately, we describe several simple criteria under which invoking such adjustment is on average beneficial. In a typical database search, at least one of these criteria is satisfied by over half the related sequence pairs. Compositional substitution matrix adjustment is now available in NCBI's protein-protein version of BLAST. PMID:16218944

  19. Fast and accurate database searches with MS-GF+Percolator

    SciTech Connect

    Granholm, Viktor; Kim, Sangtae; Navarro, Jose' C.; Sjolund, Erik; Smith, Richard D.; Kall, Lukas

    2014-02-28

    To identify peptides and proteins from the large number of fragmentation spectra in mass spectrometrybased proteomics, researches commonly employ so called database search engines. Additionally, postprocessors like Percolator have been used on the results from such search engines, to assess confidence, infer peptides and generally increase the number of identifications. A recent search engine, MS-GF+, has previously been showed to out-perform these classical search engines in terms of the number of identified spectra. However, MS-GF+ generates only limited statistical estimates of the results, hence hampering the biological interpretation. Here, we enabled Percolator-processing for MS-GF+ output, and observed an increased number of identified peptides for a wide variety of datasets. In addition, Percolator directly reports false discovery rate estimates, such as q values and posterior error probabilities, as well as p values, for peptide-spectrum matches, peptides and proteins, functions useful for the whole proteomics community.

  20. A Taxonomic Search Engine: Federating taxonomic databases using web services

    PubMed Central

    Page, Roderic DM

    2005-01-01

    Background The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. Results The Taxonomic Search Engine (TSE) is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO) and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID) authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata) for each name. Conclusion The Taxonomic Search Engine is available at and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names. PMID:15757517

  1. Content-Based Search on a Database of Geometric Models: Identifying Objects of Similar Shape

    SciTech Connect

    XAVIER, PATRICK G.; HENRY, TYSON R.; LAFARGE, ROBERT A.; MEIRANS, LILITA; RAY, LAWRENCE P.

    2001-11-01

    The Geometric Search Engine is a software system for storing and searching a database of geometric models. The database maybe searched for modeled objects similar in shape to a target model supplied by the user. The database models are generally from CAD models while the target model may be either a CAD model or a model generated from range data collected from a physical object. This document describes key generation, database layout, and search of the database.

  2. Multi-Database Searching in the Behavioral Sciences--Part I: Basic Techniques and Core Databases.

    ERIC Educational Resources Information Center

    Angier, Jennifer J.; Epstein, Barbara A.

    1980-01-01

    Outlines practical searching techniques in seven core behavioral science databases accessing psychological literature: Psychological Abstracts, Social Science Citation Index, Biosis, Medline, Excerpta Medica, Sociological Abstracts, ERIC. Use of individual files is discussed and their relative strengths/weaknesses are compared. Appended is a list…

  3. Are Bibliographic Management Software Search Interfaces Reliable?: A Comparison between Search Results Obtained Using Database Interfaces and the EndNote Online Search Function

    ERIC Educational Resources Information Center

    Fitzgibbons, Megan; Meert, Deborah

    2010-01-01

    The use of bibliographic management software and its internal search interfaces is now pervasive among researchers. This study compares the results between searches conducted in academic databases' search interfaces versus the EndNote search interface. The results show mixed search reliability, depending on the database and type of search…

  4. Are Bibliographic Management Software Search Interfaces Reliable?: A Comparison between Search Results Obtained Using Database Interfaces and the EndNote Online Search Function

    ERIC Educational Resources Information Center

    Fitzgibbons, Megan; Meert, Deborah

    2010-01-01

    The use of bibliographic management software and its internal search interfaces is now pervasive among researchers. This study compares the results between searches conducted in academic databases' search interfaces versus the EndNote search interface. The results show mixed search reliability, depending on the database and type of search…

  5. Feature selection in validating mass spectrometry database search results.

    PubMed

    Fang, Jianwen; Dong, Yinghua; Williams, Todd D; Lushington, Gerald H

    2008-02-01

    Tandem mass spectrometry (MS/MS) combined with protein database searching has been widely used in protein identification. A validation procedure is generally required to reduce the number of false positives. Advanced tools using statistical and machine learning approaches may provide faster and more accurate validation than manual inspection and empirical filtering criteria. In this study, we use two feature selection algorithms based on random forest and support vector machine to identify peptide properties that can be used to improve validation models. We demonstrate that an improved model based on an optimized set of features reduces the number of false positives by 58% relative to the model which used only search engine scores, at the same sensitivity score of 0.8. In addition, we develop classification models based on the physicochemical properties and protein sequence environment of these peptides without using search engine scores. The performance of the best model based on the support vector machine algorithm is at 0.8 AUC, 0.78 accuracy, and 0.7 specificity, suggesting a reasonably accurate classification. The identified properties important to fragmentation and ionization can be either used in independent validation tools or incorporated into peptide sequencing and database search algorithms to improve existing software programs.

  6. Continuous-time quantum search on balanced trees

    NASA Astrophysics Data System (ADS)

    Philipp, Pascal; Tarrataca, Luís; Boettcher, Stefan

    2016-03-01

    We examine the effect of network heterogeneity on the performance of quantum search algorithms. To this end, we study quantum search on a tree for the oracle Hamiltonian formulation employed by continuous-time quantum walks. We use analytical and numerical arguments to show that the exponent of the asymptotic running time ˜Nβ changes uniformly from β =0.5 to β =1 as the searched-for site is moved from the root of the tree towards the leaves. These results imply that the time complexity of the quantum search algorithm on a balanced tree is closely correlated with certain path-based centrality measures of the searched-for site.

  7. Enriching Great Britain's National Landslide Database by searching newspaper archives

    NASA Astrophysics Data System (ADS)

    Taylor, Faith E.; Malamud, Bruce D.; Freeborough, Katy; Demeritt, David

    2015-11-01

    Our understanding of where landslide hazard and impact will be greatest is largely based on our knowledge of past events. Here, we present a method to supplement existing records of landslides in Great Britain by searching an electronic archive of regional newspapers. In Great Britain, the British Geological Survey (BGS) is responsible for updating and maintaining records of landslide events and their impacts in the National Landslide Database (NLD). The NLD contains records of more than 16,500 landslide events in Great Britain. Data sources for the NLD include field surveys, academic articles, grey literature, news, public reports and, since 2012, social media. We aim to supplement the richness of the NLD by (i) identifying additional landslide events, (ii) acting as an additional source of confirmation of events existing in the NLD and (iii) adding more detail to existing database entries. This is done by systematically searching the Nexis UK digital archive of 568 regional newspapers published in the UK. In this paper, we construct a robust Boolean search criterion by experimenting with landslide terminology for four training periods. We then apply this search to all articles published in 2006 and 2012. This resulted in the addition of 111 records of landslide events to the NLD over the 2 years investigated (2006 and 2012). We also find that we were able to obtain information about landslide impact for 60-90% of landslide events identified from newspaper articles. Spatial and temporal patterns of additional landslides identified from newspaper articles are broadly in line with those existing in the NLD, confirming that the NLD is a representative sample of landsliding in Great Britain. This method could now be applied to more time periods and/or other hazards to add richness to databases and thus improve our ability to forecast future events based on records of past events.

  8. Grover's quantum search algorithm and Diophantine approximation

    SciTech Connect

    Dolev, Shahar; Pitowsky, Itamar; Tamir, Boaz

    2006-02-15

    In a fundamental paper [Phys. Rev. Lett. 78, 325 (1997)] Grover showed how a quantum computer can find a single marked object in a database of size N by using only O({radical}(N)) queries of the oracle that identifies the object. His result was generalized to the case of finding one object in a subset of marked elements. We consider the following computational problem: A subset of marked elements is given whose number of elements is either M or K, our task is to determine which is the case. We show how to solve this problem with a high probability of success using iterations of Grover's basic step only, and no other algorithm. Let m be the required number of iterations; we prove that under certain restrictions on the sizes of M and K the estimation m<2{radical}(N)/({radical}(K)-{radical}(M)) obtains. This bound reproduces previous results based on more elaborate algorithms, and is known to be optimal up to a constant factor. Our method involves simultaneous Diophantine approximations, so that Grover's algorithm is conceptualized as an orbit of an ergodic automorphism of the torus. We comment on situations where the algorithm may be slow, and note the similarity between these cases and the problem of small divisors in classical mechanics.

  9. Physical Database Design for Efficient Time-Series Similarity Search

    NASA Astrophysics Data System (ADS)

    Kim, Sang-Wook; Kim, Jinho; Park, Sanghyun

    Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.

  10. Compact variant-rich customized sequence database and a fast and sensitive database search for efficient proteogenomic analyses.

    PubMed

    Park, Heejin; Bae, Junwoo; Kim, Hyunwoo; Kim, Sangok; Kim, Hokeun; Mun, Dong-Gi; Joh, Yoonsung; Lee, Wonyeop; Chae, Sehyun; Lee, Sanghyuk; Kim, Hark Kyun; Hwang, Daehee; Lee, Sang-Won; Paek, Eunok

    2014-12-01

    In proteogenomic analysis, construction of a compact, customized database from mRNA-seq data and a sensitive search of both reference and customized databases are essential to accurately determine protein abundances and structural variations at the protein level. However, these tasks have not been systematically explored, but rather performed in an ad-hoc fashion. Here, we present an effective method for constructing a compact database containing comprehensive sequences of sample-specific variants--single nucleotide variants, insertions/deletions, and stop-codon mutations derived from Exome-seq and RNA-seq data. It, however, occupies less space by storing variant peptides, not variant proteins. We also present an efficient search method for both customized and reference databases. The separate searches of the two databases increase the search time, and a unified search is less sensitive to identify variant peptides due to the smaller size of the customized database, compared to the reference database, in the target-decoy setting. Our method searches the unified database once, but performs target-decoy validations separately. Experimental results show that our approach is as fast as the unified search and as sensitive as the separate searches. Our customized database includes mutation information in the headers of variant peptides, thereby facilitating the inspection of peptide-spectrum matches.

  11. Fast and accurate database searches with MS-GF+Percolator.

    PubMed

    Granholm, Viktor; Kim, Sangtae; Navarro, José C F; Sjölund, Erik; Smith, Richard D; Käll, Lukas

    2014-02-07

    One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community.

  12. Supporting Ontology-based Keyword Search over Medical Databases

    PubMed Central

    Kementsietsidis, Anastasios; Lim, Lipyeow; Wang, Min

    2008-01-01

    The proliferation of medical terms poses a number of challenges in the sharing of medical information among different stakeholders. Ontologies are commonly used to establish relationships between different terms, yet their role in querying has not been investigated in detail. In this paper, we study the problem of supporting ontology-based keyword search queries on a database of electronic medical records. We present several approaches to support this type of queries, study the advantages and limitations of each approach, and summarize the lessons learned as best practices. PMID:18998839

  13. The Saccharomyces Genome Database: Advanced Searching Methods and Data Mining.

    PubMed

    Cherry, J Michael

    2015-12-02

    At the core of the Saccharomyces Genome Database (SGD) are chromosomal features that encode a product. These include protein-coding genes and major noncoding RNA genes, such as tRNA and rRNA genes. The basic entry point into SGD is a gene or open-reading frame name that leads directly to the locus summary information page. A keyword describing function, phenotype, selective condition, or text from abstracts will also provide a door into the SGD. A DNA or protein sequence can be used to identify a gene or a chromosomal region using BLAST. Protein and DNA sequence identifiers, PubMed and NCBI IDs, author names, and function terms are also valid entry points. The information in SGD has been gathered and is maintained by a group of scientific biocurators and software developers who are devoted to providing researchers with up-to-date information from the published literature, connections to all the major research resources, and tools that allow the data to be explored. All the collected information cannot be represented or summarized for every possible question; therefore, it is necessary to be able to search the structured data in the database. This protocol describes the YeastMine tool, which provides an advanced search capability via an interactive tool. The SGD also archives results from microarray expression experiments, and a strategy designed to explore these data using the SPELL (Serial Pattern of Expression Levels Locator) tool is provided.

  14. Quantum Associative Neural Network with Nonlinear Search Algorithm

    NASA Astrophysics Data System (ADS)

    Zhou, Rigui; Wang, Huian; Wu, Qian; Shi, Yang

    2012-03-01

    Based on analysis on properties of quantum linear superposition, to overcome the complexity of existing quantum associative memory which was proposed by Ventura, a new storage method for multiply patterns is proposed in this paper by constructing the quantum array with the binary decision diagrams. Also, the adoption of the nonlinear search algorithm increases the pattern recalling speed of this model which has multiply patterns to O( {log2}^{2^{n -t}} ) = O( n - t ) time complexity, where n is the number of quantum bit and t is the quantum information of the t quantum bit. Results of case analysis show that the associative neural network model proposed in this paper based on quantum learning is much better and optimized than other researchers' counterparts both in terms of avoiding the additional qubits or extraordinary initial operators, storing pattern and improving the recalling speed.

  15. Nested Quantum Search and NP-Complete Problem

    NASA Technical Reports Server (NTRS)

    Williams, C.; Cerf, N. J.; Grover, L. K.

    1998-01-01

    A quantum algorithm is known that solves an unstructured search problem in a number of iterations of order square-root of d, where d is the dimension of the search space, whereas any classical algorithm scales as O(d).

  16. Accelerating chemical database searching using graphics processing units.

    PubMed

    Liu, Pu; Agrafiotis, Dimitris K; Rassokhin, Dmitrii N; Yang, Eric

    2011-08-22

    The utility of chemoinformatics systems depends on the accurate computer representation and efficient manipulation of chemical compounds. In such systems, a small molecule is often digitized as a large fingerprint vector, where each element indicates the presence/absence or the number of occurrences of a particular structural feature. Since in theory the number of unique features can be exceedingly large, these fingerprint vectors are usually folded into much shorter ones using hashing and modulo operations, allowing fast "in-memory" manipulation and comparison of molecules. There is increasing evidence that lossless fingerprints can substantially improve retrieval performance in chemical database searching (substructure or similarity), which have led to the development of several lossless fingerprint compression algorithms. However, any gains in storage and retrieval afforded by compression need to be weighed against the extra computational burden required for decompression before these fingerprints can be compared. Here we demonstrate that graphics processing units (GPU) can greatly alleviate this problem, enabling the practical application of lossless fingerprints on large databases. More specifically, we show that, with the help of a ~$500 ordinary video card, the entire PubChem database of ~32 million compounds can be searched in ~0.2-2 s on average, which is 2 orders of magnitude faster than a conventional CPU. If multiple query patterns are processed in batch, the speedup is even more dramatic (less than 0.02-0.2 s/query for 1000 queries). In the present study, we use the Elias gamma compression algorithm, which results in a compression ratio as high as 0.097.

  17. Grover's search algorithm with an entangled database state

    NASA Astrophysics Data System (ADS)

    Alsing, Paul M.; McDonald, Nathan

    2011-05-01

    Grover's oracle based unstructured search algorithm is often stated as "given a phone number in a directory, find the associated name." More formally, the problem can be stated as "given as input a unitary black box Uf for computing an unknown function f:{0,1}n ->{0,1}find x=x0 an element of {0,1}n such that f(x0) =1, (and zero otherwise). The crucial role of the externally supplied oracle Uf (whose inner workings are unknown to the user) is to change the sign of the solution 0 x , while leaving all other states unaltered. Thus, Uf depends on the desired solution x0. This paper examines an amplitude amplification algorithm in which the user encodes the directory (e.g. names and telephone numbers) into an entangled database state, which at a later time can be queried on one supplied component entry (e.g. a given phone number t0) to find the other associated unknown component (e.g. name x0). For N=2n names x with N associated phone numbers t , performing amplitude amplification on a subspace of size N of the total space of size N2 produces the desired state 0 0 x t in √N steps. We discuss how and why sequential (though not concurrent parallel) searches can be performed on multiple database states. Finally, we show how this procedure can be generalized to databases with more than two correlated lists (e.g. x t s r ...).

  18. Numerical database system based on a weighted search tree

    NASA Astrophysics Data System (ADS)

    Park, S. C.; Bahri, C.; Draayer, J. P.; Zheng, S.-Q.

    1994-09-01

    An on-line numerical database system, that is based on the concept of a weighted search tree and which functions like a file directory, is introduced. The system, which is designed to aid in reducing time-consuming redundant calculations in numerically intensive computations, can be used to fetch, insert and delete items from a dynamically generated list in optimal [ O(log n) where n is the number of items in the list] time. Items in the list are ordered according to a priority queue with the initial priority for each element set either automatically or by an user supplied algorithm. The priority queue is updated on-the-fly to reflect element hit frequency. Items can be added to a database so long as there is space to accommodate them, and when there is not, the lowest priority element(s) is removed to make room for an incoming element(s) with higher priority. The system acts passively and therefore can be applied to any number of databases, with the same or different structures, within a single application.

  19. Quantum algorithms for the ordered search problem via semidefinite programming

    SciTech Connect

    Childs, Andrew M.; Landahl, Andrew J.; Parrilo, Pablo A.

    2007-03-15

    One of the most basic computational problems is the task of finding a desired item in an ordered list of N items. While the best classical algorithm for this problem uses log{sub 2} N queries to the list, a quantum computer can solve the problem using a constant factor fewer queries. However, the precise value of this constant is unknown. By characterizing a class of quantum query algorithms for the ordered search problem in terms of a semidefinite program, we find quantum algorithms for small instances of the ordered search problem. Extending these algorithms to arbitrarily large instances using recursion, we show that there is an exact quantum ordered search algorithm using 4 log{sub 605} N{approx_equal}0.433 log{sub 2} N queries, which improves upon the previously best known exact algorithm.

  20. Automated searching for quantum subsystem codes

    SciTech Connect

    Crosswhite, Gregory M.; Bacon, Dave

    2011-02-15

    Quantum error correction allows for faulty quantum systems to behave in an effectively error-free manner. One important class of techniques for quantum error correction is the class of quantum subsystem codes, which are relevant both to active quantum error-correcting schemes as well as to the design of self-correcting quantum memories. Previous approaches for investigating these codes have focused on applying theoretical analysis to look for interesting codes and to investigate their properties. In this paper we present an alternative approach that uses computational analysis to accomplish the same goals. Specifically, we present an algorithm that computes the optimal quantum subsystem code that can be implemented given an arbitrary set of measurement operators that are tensor products of Pauli operators. We then demonstrate the utility of this algorithm by performing a systematic investigation of the quantum subsystem codes that exist in the setting where the interactions are limited to two-body interactions between neighbors on lattices derived from the convex uniform tilings of the plane.

  1. Searching protein structure databases with DaliLite v.3.

    PubMed

    Holm, L; Kääriäinen, S; Rosenström, P; Schenkel, A

    2008-12-01

    The Red Queen said, 'It takes all the running you can do, to keep in the same place.' Lewis Carrol Newly solved protein structures are routinely scanned against structures already in the Protein Data Bank (PDB) using Internet servers. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. The number of known structures continues to grow exponentially. Sensitive-thorough but slow-search algorithms are challenged to deliver results in a reasonable time, as there are now more structures in the PDB than seconds in a day. The brute-force solution would be to distribute the individual comparisons on a massively parallel computer. A frugal solution, as implemented in the Dali server, is to reduce the total computational cost by pruning search space using prior knowledge about the distribution of structures in fold space. This note reports paradigm revisions that enable maintaining such a knowledge base up-to-date on a PC. The Dali server for protein structure database searching at http://ekhidna.biocenter.helsinki.fi/dali_server is running DaliLite v.3. The software can be downloaded for academic use from http://ekhidna.biocenter.helsinki.fi/dali_lite/downloads/v3.

  2. Searching protein structure databases with DaliLite v.3

    PubMed Central

    Holm, L.; Kääriäinen, S.; Rosenström, P.; Schenkel, A.

    2008-01-01

    The Red Queen said, ‘It takes all the running you can do, to keep in the same place.’ Lewis Carrol Motivation: Newly solved protein structures are routinely scanned against structures already in the Protein Data Bank (PDB) using Internet servers. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. The number of known structures continues to grow exponentially. Sensitive—thorough but slow—search algorithms are challenged to deliver results in a reasonable time, as there are now more structures in the PDB than seconds in a day. The brute-force solution would be to distribute the individual comparisons on a massively parallel computer. A frugal solution, as implemented in the Dali server, is to reduce the total computational cost by pruning search space using prior knowledge about the distribution of structures in fold space. This note reports paradigm revisions that enable maintaining such a knowledge base up-to-date on a PC. Availability: The Dali server for protein structure database searching at http://ekhidna.biocenter.helsinki.fi/dali_server is running DaliLite v.3. The software can be downloaded for academic use from http://ekhidna.biocenter.helsinki.fi/dali_lite/downloads/v3. Contact: liisa.holm@helsinki.fi PMID:18818215

  3. Quantum search with multiple walk steps per oracle query

    NASA Astrophysics Data System (ADS)

    Wong, Thomas G.; Ambainis, Andris

    2015-08-01

    We identify a key difference between quantum search by discrete- and continuous-time quantum walks: a discrete-time walk typically performs one walk step per oracle query, whereas a continuous-time walk can effectively perform multiple walk steps per query while only counting query time. As a result, we show that continuous-time quantum walks can outperform their discrete-time counterparts, even though both achieve quadratic speedups over their corresponding classical random walks. To provide greater equity, we allow the discrete-time quantum walk to also take multiple walk steps per oracle query while only counting queries. Then it matches the continuous-time algorithm's runtime, but such that it is a cubic speedup over its corresponding classical random walk. This yields a greater-than-quadratic speedup for quantum search over its corresponding classical random walk.

  4. Global Identification of Protein Post-translational Modifications in a Single-Pass Database Search.

    PubMed

    Shortreed, Michael R; Wenger, Craig D; Frey, Brian L; Sheynkman, Gloria M; Scalf, Mark; Keller, Mark P; Attie, Alan D; Smith, Lloyd M

    2015-11-06

    Bottom-up proteomics database search algorithms used for peptide identification cannot comprehensively identify post-translational modifications (PTMs) in a single-pass because of high false discovery rates (FDRs). A new approach to database searching enables global PTM (G-PTM) identification by exclusively looking for curated PTMs, thereby avoiding the FDR penalty experienced during conventional variable modification searches. We identified over 2200 unique, high-confidence modified peptides comprising 26 different PTM types in a single-pass database search.

  5. Multiparty controlled quantum secure direct communication based on quantum search algorithm

    NASA Astrophysics Data System (ADS)

    Kao, Shih-Hung; Hwang, Tzonelih

    2013-12-01

    In this study, a new controlled quantum secure direct communication (CQSDC) protocol using the quantum search algorithm as the encoding function is proposed. The proposed protocol is based on the multi-particle Greenberger-Horne-Zeilinger entangled state and the one-step quantum transmission strategy. Due to the one-step transmission of qubits, the proposed protocol can be easily extended to a multi-controller environment, and is also free from the Trojan horse attacks. The analysis shows that the use of quantum search algorithm in the construction of CQSDC appears very promising.

  6. Automated Search for new Quantum Experiments

    NASA Astrophysics Data System (ADS)

    Krenn, Mario; Malik, Mehul; Fickler, Robert; Lapkiewicz, Radek; Zeilinger, Anton

    2016-03-01

    Quantum mechanics predicts a number of, at first sight, counterintuitive phenomena. It therefore remains a question whether our intuition is the best way to find new experiments. Here, we report the development of the computer algorithm Melvin which is able to find new experimental implementations for the creation and manipulation of complex quantum states. Indeed, the discovered experiments extensively use unfamiliar and asymmetric techniques which are challenging to understand intuitively. The results range from the first implementation of a high-dimensional Greenberger-Horne-Zeilinger state, to a vast variety of experiments for asymmetrically entangled quantum states—a feature that can only exist when both the number of involved parties and dimensions is larger than 2. Additionally, new types of high-dimensional transformations are found that perform cyclic operations. Melvin autonomously learns from solutions for simpler systems, which significantly speeds up the discovery rate of more complex experiments. The ability to automate the design of a quantum experiment can be applied to many quantum systems and allows the physical realization of quantum states previously thought of only on paper.

  7. Adiabatic Quantum Algorithm for Search Engine Ranking

    NASA Astrophysics Data System (ADS)

    Garnerone, Silvano; Zanardi, Paolo; Lidar, Daniel A.

    2012-06-01

    We propose an adiabatic quantum algorithm for generating a quantum pure state encoding of the PageRank vector, the most widely used tool in ranking the relative importance of internet pages. We present extensive numerical simulations which provide evidence that this algorithm can prepare the quantum PageRank state in a time which, on average, scales polylogarithmically in the number of web pages. We argue that the main topological feature of the underlying web graph allowing for such a scaling is the out-degree distribution. The top-ranked log⁡(n) entries of the quantum PageRank state can then be estimated with a polynomial quantum speed-up. Moreover, the quantum PageRank state can be used in “q-sampling” protocols for testing properties of distributions, which require exponentially fewer measurements than all classical schemes designed for the same task. This can be used to decide whether to run a classical update of the PageRank.

  8. Adiabatic quantum algorithm for search engine ranking.

    PubMed

    Garnerone, Silvano; Zanardi, Paolo; Lidar, Daniel A

    2012-06-08

    We propose an adiabatic quantum algorithm for generating a quantum pure state encoding of the PageRank vector, the most widely used tool in ranking the relative importance of internet pages. We present extensive numerical simulations which provide evidence that this algorithm can prepare the quantum PageRank state in a time which, on average, scales polylogarithmically in the number of web pages. We argue that the main topological feature of the underlying web graph allowing for such a scaling is the out-degree distribution. The top-ranked log(n) entries of the quantum PageRank state can then be estimated with a polynomial quantum speed-up. Moreover, the quantum PageRank state can be used in "q-sampling" protocols for testing properties of distributions, which require exponentially fewer measurements than all classical schemes designed for the same task. This can be used to decide whether to run a classical update of the PageRank.

  9. Global vs. Localized Search: A Comparison of Database Selection Methods in a Hierarchical Environment.

    ERIC Educational Resources Information Center

    Conrad, Jack G.; Claussen, Joanne Smestad; Yang, Changwen

    2002-01-01

    Compares standard global information retrieval searching with more localized techniques to address the database selection problem that users often have when searching for the most relevant database, based on experiences with the Westlaw Directory. Findings indicate that a browse plus search approach in a hierarchical environment produces the most…

  10. Comparison Study of Overlap among 21 Scientific Databases in Searching Pesticide Information.

    ERIC Educational Resources Information Center

    Meyer, Daniel E.; And Others

    1983-01-01

    Evaluates overlapping coverage of 21 scientific databases used in 10 online pesticide searches in an attempt to identify minimum number of databases needed to generate 90 percent of unique, relevant citations for given search. Comparison of searches combined under given pesticide usage (herbicide, fungicide, insecticide) is discussed. Nine…

  11. Comparison Study of Overlap among 21 Scientific Databases in Searching Pesticide Information.

    ERIC Educational Resources Information Center

    Meyer, Daniel E.; And Others

    1983-01-01

    Evaluates overlapping coverage of 21 scientific databases used in 10 online pesticide searches in an attempt to identify minimum number of databases needed to generate 90 percent of unique, relevant citations for given search. Comparison of searches combined under given pesticide usage (herbicide, fungicide, insecticide) is discussed. Nine…

  12. Test-state approach to the quantum search problem

    NASA Astrophysics Data System (ADS)

    Sehrawat, Arun; Nguyen, Le Huy; Englert, Berthold-Georg

    2011-05-01

    The search for “a quantum needle in a quantum haystack” is a metaphor for the problem of finding out which one of a permissible set of unitary mappings—the oracles—is implemented by a given black box. Grover’s algorithm solves this problem with quadratic speedup as compared with the analogous search for “a classical needle in a classical haystack.” Since the outcome of Grover’s algorithm is probabilistic—it gives the correct answer with high probability, not with certainty—the answer requires verification. For this purpose we introduce specific test states, one for each oracle. These test states can also be used to realize “a classical search for the quantum needle” which is deterministic—it always gives a definite answer after a finite number of steps—and 3.41 times as fast as the purely classical search. Since the test-state search and Grover’s algorithm look for the same quantum needle, the average number of oracle queries of the test-state search is the classical benchmark for Grover’s algorithm.

  13. Test-state approach to the quantum search problem

    SciTech Connect

    Sehrawat, Arun; Nguyen, Le Huy; Englert, Berthold-Georg

    2011-05-15

    The search for 'a quantum needle in a quantum haystack' is a metaphor for the problem of finding out which one of a permissible set of unitary mappings - the oracles - is implemented by a given black box. Grover's algorithm solves this problem with quadratic speedup as compared with the analogous search for 'a classical needle in a classical haystack'. Since the outcome of Grover's algorithm is probabilistic - it gives the correct answer with high probability, not with certainty - the answer requires verification. For this purpose we introduce specific test states, one for each oracle. These test states can also be used to realize 'a classical search for the quantum needle' which is deterministic - it always gives a definite answer after a finite number of steps - and 3.41 times as fast as the purely classical search. Since the test-state search and Grover's algorithm look for the same quantum needle, the average number of oracle queries of the test-state search is the classical benchmark for Grover's algorithm.

  14. Search Databases and Statistics: Pitfalls and Best Practices in Phosphoproteomics.

    PubMed

    Refsgaard, Jan C; Munk, Stephanie; Jensen, Lars J

    2016-01-01

    Advances in mass spectrometric instrumentation in the past 15 years have resulted in an explosion in the raw data yield from typical phosphoproteomics workflows. This poses the challenge of confidently identifying peptide sequences, localizing phosphosites to proteins and quantifying these from the vast amounts of raw data. This task is tackled by computational tools implementing algorithms that match the experimental data to databases, providing the user with lists for downstream analysis. Several platforms for such automated interpretation of mass spectrometric data have been developed, each having strengths and weaknesses that must be considered for the individual needs. These are reviewed in this chapter. Equally critical for generating highly confident output datasets is the application of sound statistical criteria to limit the inclusion of incorrect peptide identifications from database searches. Additionally, careful filtering and use of appropriate statistical tests on the output datasets affects the quality of all downstream analyses and interpretation of the data. Our considerations and general practices on these aspects of phosphoproteomics data processing are presented here.

  15. Cycloquest: Identification of cyclopeptides via database search of their mass spectra against genome databases

    PubMed Central

    Mohimani, Hosein; Liu, Wei-Ting; Mylne, Joshua S.; Poth, Aaron G.; Colgrave, Michelle L.; Tran, Dat; Selsted, Michael E.; Dorrestein, Pieter C.; Pevzner, Pavel A.

    2011-01-01

    Hundreds of ribosomally synthesized cyclopeptides have been isolated from all domains of life, the vast majority having been reported in the last 15 years. Studies of cyclic peptides have highlighted their exceptional potential both as stable drug scaffolds and as biomedicines in their own right. Despite this, computational techniques for cyclopeptide identification are still in their infancy, with many such peptides remaining uncharacterized. Tandem mass spectrometry has occupied a niche role in cyclopeptide identification, taking over from traditional techniques such as nuclear magnetic resonance spectroscopy (NMR). MS/MS studies require only picogram quantities of peptide (compared to milligrams for NMR studies) and are applicable to complex samples, abolishing the requirement for time-consuming chromatographic purification. While database search tools such as Sequest and Mascot have become standard tools for the MS/MS identification of linear peptides, they are not applicable to cyclopeptides, due to the parent mass shift resulting from cyclization, and different fragmentation pattern of cyclic peptides. In this paper, we describe the development of a novel database search methodology to aid in the identification of cyclopeptides by mass spectrometry, and evaluate its utility in identifying two peptide rings from Helianthus annuus, a bacterial cannibalism factor from Bacillus subtilis, and a θ-defensin from Rhesus macaque. PMID:21851130

  16. Optimal state discrimination and unstructured search in nonlinear quantum mechanics

    NASA Astrophysics Data System (ADS)

    Childs, Andrew M.; Young, Joshua

    2016-02-01

    Nonlinear variants of quantum mechanics can solve tasks that are impossible in standard quantum theory, such as perfectly distinguishing nonorthogonal states. Here we derive the optimal protocol for distinguishing two states of a qubit using the Gross-Pitaevskii equation, a model of nonlinear quantum mechanics that arises as an effective description of Bose-Einstein condensates. Using this protocol, we present an algorithm for unstructured search in the Gross-Pitaevskii model, obtaining an exponential improvement over a previous algorithm of Meyer and Wong. This result establishes a limitation on the effectiveness of the Gross-Pitaevskii approximation. More generally, we demonstrate similar behavior under a family of related nonlinearities, giving evidence that the ability to quickly discriminate nonorthogonal states and thereby solve unstructured search is a generic feature of nonlinear quantum mechanics.

  17. An approach in building a chemical compound search engine in oracle database.

    PubMed

    Wang, H; Volarath, P; Harrison, R

    2005-01-01

    A searching or identifying of chemical compounds is an important process in drug design and in chemistry research. An efficient search engine involves a close coupling of the search algorithm and database implementation. The database must process chemical structures, which demands the approaches to represent, store, and retrieve structures in a database system. In this paper, a general database framework for working as a chemical compound search engine in Oracle database is described. The framework is devoted to eliminate data type constrains for potential search algorithms, which is a crucial step toward building a domain specific query language on top of SQL. A search engine implementation based on the database framework is also demonstrated. The convenience of the implementation emphasizes the efficiency and simplicity of the framework.

  18. The Use of AJAX in Searching a Bibliographic Database: A Case Study of the Italian Biblioteche Oggi Database

    ERIC Educational Resources Information Center

    Cavaleri, Piero

    2008-01-01

    Purpose: The purpose of this paper is to describe the use of AJAX for searching the Biblioteche Oggi database of bibliographic records. Design/methodology/approach: The paper is a demonstration of how bibliographic database single page interfaces allow the implementation of more user-friendly features for social and collaborative tasks. Findings:…

  19. The Use of AJAX in Searching a Bibliographic Database: A Case Study of the Italian Biblioteche Oggi Database

    ERIC Educational Resources Information Center

    Cavaleri, Piero

    2008-01-01

    Purpose: The purpose of this paper is to describe the use of AJAX for searching the Biblioteche Oggi database of bibliographic records. Design/methodology/approach: The paper is a demonstration of how bibliographic database single page interfaces allow the implementation of more user-friendly features for social and collaborative tasks. Findings:…

  20. Quantum Mechanics Helps in Searching for a Needle in a Haystack

    SciTech Connect

    Grover, L.K.

    1997-07-01

    Quantum mechanics can speed up a range of search applications over unsorted data. For example, imagine a phone directory containing N names arranged in completely random order. To find someone`s phone number with a probability of 50{percent}, any classical algorithm (whether deterministic or probabilistic) will need to access the database a minimum of 0.5N times. Quantum mechanical systems can be in a superposition of states and simultaneously examine multiple names. By properly adjusting the phases of various operations, successful computations reinforce each other while others interfere randomly. As a result, the desired phone number can be obtained in only O({radical}(N)) accesses to the database. {copyright} {ital 1997} {ital The American Physical Society}

  1. Synchrotron light sources: The search for quantum chaos

    SciTech Connect

    Schlachter, Fred

    2001-02-01

    A storage ring is a specialized synchrotron in which a stored beam of relativistic electrons produces radiation in the vuv and x-ray regions of the spectrum. High-brightness radiation is used at the ALS to study doubly excited autoionizing states of the helium atom in the search for quantum chaos.

  2. On the complexity of search for keys in quantum cryptography

    NASA Astrophysics Data System (ADS)

    Molotkov, S. N.

    2016-03-01

    The trace distance is used as a security criterion in proofs of security of keys in quantum cryptography. Some authors doubted that this criterion can be reduced to criteria used in classical cryptography. The following question has been answered in this work. Let a quantum cryptography system provide an ɛ-secure key such that ½‖ρ XE - ρ U ⊗ ρ E ‖1 < ɛ, which will be repeatedly used in classical encryption algorithms. To what extent does the ɛ-secure key reduce the number of search steps (guesswork) as compared to the use of ideal keys? A direct relation has been demonstrated between the complexity of the complete consideration of keys, which is one of the main security criteria in classical systems, and the trace distance used in quantum cryptography. Bounds for the minimum and maximum numbers of search steps for the determination of the actual key have been presented.

  3. Laplacian versus adjacency matrix in quantum walk search

    NASA Astrophysics Data System (ADS)

    Wong, Thomas G.; Tarrataca, Luís; Nahimov, Nikolay

    2016-10-01

    A quantum particle evolving by Schrödinger's equation contains, from the kinetic energy of the particle, a term in its Hamiltonian proportional to Laplace's operator. In discrete space, this is replaced by the discrete or graph Laplacian, which gives rise to a continuous-time quantum walk. Besides this natural definition, some quantum walk algorithms instead use the adjacency matrix to effect the walk. While this is equivalent to the Laplacian for regular graphs, it is different for non-regular graphs and is thus an inequivalent quantum walk. We algorithmically explore this distinction by analyzing search on the complete bipartite graph with multiple marked vertices, using both the Laplacian and adjacency matrix. The two walks differ qualitatively and quantitatively in their required jumping rate, runtime, sampling of marked vertices, and in what constitutes a natural initial state. Thus the choice of the Laplacian or adjacency matrix to effect the walk has important algorithmic consequences.

  4. Towards computational improvement of DNA database indexing and short DNA query searching

    PubMed Central

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-01-01

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions are not reported, if the database is searched against a query shorter than nucleotides, such that is the length of the DNA database words being mapped and is the length of the query. A solution of this drawback is also presented. PMID:26019584

  5. Towards computational improvement of DNA database indexing and short DNA query searching.

    PubMed

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-09-03

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.

  6. EasyKSORD: A Platform of Keyword Search Over Relational Databases

    NASA Astrophysics Data System (ADS)

    Peng, Zhaohui; Li, Jing; Wang, Shan

    Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.

  7. Adaptive search in mobile peer-to-peer databases

    NASA Technical Reports Server (NTRS)

    Wolfson, Ouri (Inventor); Xu, Bo (Inventor)

    2010-01-01

    Information is stored in a plurality of mobile peers. The peers communicate in a peer to peer fashion, using a short-range wireless network. Occasionally, a peer initiates a search for information in the peer to peer network by issuing a query. Queries and pieces of information, called reports, are transmitted among peers that are within a transmission range. For each search additional peers are utilized, wherein these additional peers search and relay information on behalf of the originator of the search.

  8. Faster quantum searching with almost any diffusion operator

    NASA Astrophysics Data System (ADS)

    Tulsi, Avatar

    2015-05-01

    Grover's search algorithm drives a quantum system from an initial state |s > to a desired final state |t > by using selective phase inversions of these two states. Earlier, we studied a generalization of Grover's algorithm that relaxes the assumption of the efficient implementation of Is, the selective phase inversion of the initial state, also known as a diffusion operator. This assumption is known to become a serious handicap in cases of physical interest. Our general search algorithm works with almost any diffusion operator Ds with the only restriction of having |s > as one of its eigenstates. The price that we pay for using any operator is an increase in the number of oracle queries by a factor of O (B ) , where B is a characteristic of the eigenspectrum of Ds and can be large in some situations. Here we show that by using a quantum Fourier transform, we can regain the optimal query complexity of Grover's algorithm without losing the freedom of using any diffusion operator for quantum searching. However, the total number of operators required by the algorithm is still O (B ) times more than that of Grover's algorithm. So our algorithm offers an advantage only if the oracle operator is computationally more expensive than the diffusion operator, which is true in most search problems.

  9. Searching Databases without Query-Building Aids: Implications for Dyslexic Users

    ERIC Educational Resources Information Center

    Berget, Gerd; Sandnes, Frode Eika

    2015-01-01

    Introduction: Few studies document the information searching behaviour of users with cognitive impairments. This paper therefore addresses the effect of dyslexia on information searching in a database with no tolerance for spelling errors and no query-building aids. The purpose was to identify effective search interface design guidelines that…

  10. Medical Students' Personal Knowledge, Searching Proficiency, and Database Use in Problem Solving.

    ERIC Educational Resources Information Center

    Wildemuth, Barbara M.; And Others

    1995-01-01

    Discusses the relationship between personal knowledge in a domain and online searching proficiency in that domain, and the relationship between searching proficiency and database-assisted problem-solving performance based on a study of medical students. Search results, selection of terms, and efficiency were found to be related to problem-solving…

  11. (Sub)structure Searches in Databases Containing Generic Chemical Structure Representations.

    ERIC Educational Resources Information Center

    Schoch-Grubler, Ursula

    1990-01-01

    Reviews three database systems available for searching generic chemical structure representations: (1) Derwent's Chemical Code System; (2) IDC's Gremas System; and (3) Derwent's Markush DARC System. Various types of searches are described, features desirable to users are discussed, and comparison searches are described that measured recall and…

  12. Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification.

    PubMed

    Li, Honglan; Joh, Yoon Sung; Kim, Hyunwoo; Paek, Eunok; Lee, Sang-Won; Hwang, Kyu-Baek

    2016-12-22

    Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available. To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches. We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search.

  13. Searching for the evidence: a practical guide to some online databases in chiropractic and osteopathy.

    PubMed

    Parkhill, Anne

    2004-11-01

    Chiropractic and Osteopathy are categorised within the family of Complementary and Alternative Medicine (CAM) by most indexers and database managers. CAM therapies can be difficult to search because relevant resources are spread over a number of databases. This paper aims to introduce basic searching skills for six databases which offer CAM literature. Six readily available databases which can be used by a busy clinician to remain informed about best practice were chosen. The databases were searched and compared using two clinical scenarios as sample searches. Evidence-based practice demands that practitioners maintain their information gathering skills, but no one source provides all the answers. We are lured by the thought that everything is available on the web easily and speedily, but may sacrifice quality for ease and speed of retrieval.

  14. Searching for religion and mental health studies required health, social science, and grey literature databases.

    PubMed

    Wright, Judy M; Cottrell, David J; Mir, Ghazala

    2014-07-01

    To determine the optimal databases to search for studies of faith-sensitive interventions for treating depression. We examined 23 health, social science, religious, and grey literature databases searched for an evidence synthesis. Databases were prioritized by yield of (1) search results, (2) potentially relevant references identified during screening, (3) included references contained in the synthesis, and (4) included references that were available in the database. We assessed the impact of databases beyond MEDLINE, EMBASE, and PsycINFO by their ability to supply studies identifying new themes and issues. We identified pragmatic workload factors that influence database selection. PsycINFO was the best performing database within all priority lists. ArabPsyNet, CINAHL, Dissertations and Theses, EMBASE, Global Health, Health Management Information Consortium, MEDLINE, PsycINFO, and Sociological Abstracts were essential for our searches to retrieve the included references. Citation tracking activities and the personal library of one of the research teams made significant contributions of unique, relevant references. Religion studies databases (Am Theo Lib Assoc, FRANCIS) did not provide unique, relevant references. Literature searches for reviews and evidence syntheses of religion and health studies should include social science, grey literature, non-Western databases, personal libraries, and citation tracking activities. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. Searching the expressed sequence tag (EST) databases: panning for genes.

    PubMed

    Jongeneel, C V

    2000-02-01

    The genomes of living organisms contain many elements, including genes coding for proteins. The portions of the genes expressed as mature mRNA, collectively known as the transcriptome, represent only a small part of the genome. The expressed sequence tag (EST) databases contain an increasingly large part of the transcriptome of many species. For this reason, these databases are probably the most abundant source of new coding sequences available today. However, the raw data deposited in the EST databases are to a large extent unorganised, unannotated, redundant and of relatively low quality. This paper reviews some of the characteristics of the EST data, and the methods that can be used to find novel protein sequences within them. It also documents a collection of databases, software and web sites that can be useful to biologists interested in mining the EST databases over the Internet, or in establishing a local environment for such analyses.

  16. Content Evaluation of Textual CD-ROM and Web Databases. Database Searching Series.

    ERIC Educational Resources Information Center

    Jacso, Peter

    This book provides guidelines for evaluating a variety of database types, including abstracting and indexing, directory, full-text, and page-image databases available in online and/or CD-ROM formats. The book discusses the purpose and techniques of comparing and evaluating the most important characteristics of textual databases, such as their…

  17. Content Evaluation of Textual CD-ROM and Web Databases. Database Searching Series.

    ERIC Educational Resources Information Center

    Jacso, Peter

    This book provides guidelines for evaluating a variety of database types, including abstracting and indexing, directory, full-text, and page-image databases available in online and/or CD-ROM formats. The book discusses the purpose and techniques of comparing and evaluating the most important characteristics of textual databases, such as their…

  18. Techniques for searching the CINAHL database using the EBSCO interface.

    PubMed

    Lawrence, Janna C

    2007-04-01

    The cumulative index to Nursing and Allied Health Literature (CINAHL) is a useful research tool for accessing articles of interest to nurses and health care professionals. More than 2,800 journals are indexed by CINAHL and can be searched easily using assigned subject headings. Detailed instructions about conducting, combining, and saving searches in CINAHL are provided in this article. Establishing an account at EBSCO further allows a nurse to save references and searches and to receive e-mail alerts when new articles on a topic of interest are published.

  19. Algorithms for database-dependent search of MS/MS data.

    PubMed

    Matthiesen, Rune

    2013-01-01

    The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.

  20. Federated or cached searches: Providing expected performance from multiple invasive species databases

    NASA Astrophysics Data System (ADS)

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-06-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search "deep" web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  1. Federated or cached searches: providing expected performance from multiple invasive species databases

    USGS Publications Warehouse

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-01-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search “deep” web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  2. Use of Composite Protein Database including Search Result Sequences for Mass Spectrometric Analysis of Cell Secretome

    PubMed Central

    Shin, Jihye; Kim, Gamin; Kabir, Mohammad Humayun; Park, Seong Jun; Lee, Seoung Taek; Lee, Cheolju

    2015-01-01

    Mass spectrometric (MS) data of human cell secretomes are usually run through the conventional human database for identification. However, the search may result in false identifications due to contamination of the secretome with fetal bovine serum (FBS) proteins. To overcome this challenge, here we provide a composite protein database including human as well as 199 FBS protein sequences for MS data search of human cell secretomes. Searching against the human-FBS database returned more reliable results with fewer false-positive and false-negative identifications compared to using either a human only database or a human-bovine database. Furthermore, the improved results validated our strategy without complex experiments like SILAC. We expect our strategy to improve the accuracy of human secreted protein identification and to also add value for general use. PMID:25822838

  3. Using the Turning Research Into Practice (TRIP) database: how do clinicians really search?*

    PubMed Central

    Meats, Emma; Brassey, Jon; Heneghan, Carl; Glasziou, Paul

    2007-01-01

    Objectives: Clinicians and patients are increasingly accessing information through Internet searches. This study aimed to examine clinicians' current search behavior when using the Turning Research Into Practice (TRIP) database to examine search engine use and the ways it might be improved. Methods: A Web log analysis was undertaken of the TRIP database—a meta-search engine covering 150 health resources including MEDLINE, The Cochrane Library, and a variety of guidelines. The connectors for terms used in searches were studied, and observations were made of 9 users' search behavior when working with the TRIP database. Results: Of 620,735 searches, most used a single term, and 12% (n = 75,947) used a Boolean operator: 11% (n = 69,006) used “AND” and 0.8% (n = 4,941) used “OR.” Of the elements of a well-structured clinical question (population, intervention, comparator, and outcome), the population was most commonly used, while fewer searches included the intervention. Comparator and outcome were rarely used. Participants in the observational study were interested in learning how to formulate better searches. Conclusions: Web log analysis showed most searches used a single term and no Boolean operators. Observational study revealed users were interested in conducting efficient searches but did not always know how. Therefore, either better training or better search interfaces are required to assist users and enable more effective searching. PMID:17443248

  4. STEPS: A Grid Search Methodology for Optimized Peptide Identification Filtering of MS/MS Database Search Results

    SciTech Connect

    Piehowski, Paul D.; Petyuk, Vladislav A.; Sandoval, John D.; Burnum, Kristin E.; Kiebel, Gary R.; Monroe, Matthew E.; Anderson, Gordon A.; Camp, David G.; Smith, Richard D.

    2013-03-01

    For bottom-up proteomics there are a wide variety of database searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection - referred to as STEPS - utilizes user-defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal "parameter set" for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true positive identifications are demonstrated using datasets derived from immunoaffinity-depleted blood serum and a bacterial cell lysate, two common proteomics sample types.

  5. Increasing number of databases searched in systematic reviews and meta-analyses between 1994 and 2014.

    PubMed

    Lam, Michael T; McDiarmid, Mary

    2016-10-01

    The purpose of this study was to determine whether the number of bibliographic databases used to search the health sciences literature in individual systematic reviews (SRs) and meta-analyses (MAs) changed over a twenty-year period related to the official 1995 launch of the Cochrane Database of Systematic Reviews (CDSR). Ovid MEDLINE was searched using a modified version of a strategy developed by the Scottish Intercollegiate Guidelines Network to identify SRs and MAs. Records from 3 milestone years were searched: the year immediately preceding (1994) and 1 (2004) and 2 (2014) decades following the CDSR launch. Records were sorted with randomization software. Abstracts or full texts of the records were examined to identify database usage until 100 relevant records were identified from each of the 3 years. The mean and median number of bibliographic databases searched in 1994, 2004, and 2014 were 1.62 and 1, 3.34 and 3, and 3.73 and 4, respectively. Studies that searched only 1 database decreased over the 3 milestone years (60% in 1994, 28% in 2004, and 10% in 2014). The number of bibliographic databases searched in individual SRs and MAs increased from 1994 to 2014.

  6. Increasing number of databases searched in systematic reviews and meta-analyses between 1994 and 2014

    PubMed Central

    Lam, Michael T.; McDiarmid, Mary

    2016-01-01

    Objectives The purpose of this study was to determine whether the number of bibliographic databases used to search the health sciences literature in individual systematic reviews (SRs) and meta-analyses (MAs) changed over a twenty-year period related to the official 1995 launch of the Cochrane Database of Systematic Reviews (CDSR). Methods Ovid MEDLINE was searched using a modified version of a strategy developed by the Scottish Intercollegiate Guidelines Network to identify SRs and MAs. Records from 3 milestone years were searched: the year immediately preceding (1994) and 1 (2004) and 2 (2014) decades following the CDSR launch. Records were sorted with randomization software. Abstracts or full texts of the records were examined to identify database usage until 100 relevant records were identified from each of the 3 years. Results The mean and median number of bibliographic databases searched in 1994, 2004, and 2014 were 1.62 and 1, 3.34 and 3, and 3.73 and 4, respectively. Studies that searched only 1 database decreased over the 3 milestone years (60% in 1994, 28% in 2004, and 10% in 2014). Conclusions The number of bibliographic databases searched in individual SRs and MAs increased from 1994 to 2014. PMID:27822149

  7. Social Work Literature Searching: Current Issues with Databases and Online Search Engines

    ERIC Educational Resources Information Center

    McGinn, Tony; Taylor, Brian; McColgan, Mary; McQuilkan, Janice

    2016-01-01

    Objectives: To compare the performance of a range of search facilities; and to illustrate the execution of a comprehensive literature search for qualitative evidence in social work. Context: Developments in literature search methods and comparisons of search facilities help facilitate access to the best available evidence for social workers.…

  8. Social Work Literature Searching: Current Issues with Databases and Online Search Engines

    ERIC Educational Resources Information Center

    McGinn, Tony; Taylor, Brian; McColgan, Mary; McQuilkan, Janice

    2016-01-01

    Objectives: To compare the performance of a range of search facilities; and to illustrate the execution of a comprehensive literature search for qualitative evidence in social work. Context: Developments in literature search methods and comparisons of search facilities help facilitate access to the best available evidence for social workers.…

  9. SledgeHMMER: a web server for batch searching the Pfam database.

    PubMed

    Chukkapalli, Giridhar; Guda, Chittibabu; Subramaniam, Shankar

    2004-07-01

    The SledgeHMMER web server is intended for genome-scale searching of the Pfam database without having to install this database and the HMMER software locally. The server implements a parallelized version of hmmpfam, the program used for searching the Pfam HMM database. Pfam search results have been calculated for the entire Swiss-Prot and TrEmbl database sequences (approximately 1.2 million) on 256 processors of IA64-based teragrid machines. The Pfam database can be searched in local, glocal or merged mode, using either gathering or E-value thresholds. Query sequences are first matched against the pre-calculated entries to retrieve results, and those without matches are processed through a new search process. Results are emailed in a space-delimited tabular format upon completion of the search. While most other Pfam-searching web servers set a limit of one sequence per query, this server processes batch sequences with no limit on the number of input sequences. The web server and downloadable data are accessible from http://SledgeHmmer.sdsc.edu.

  10. Dialog's Knowledge Index and BRS/After Dark: Database Searching on Personal Computers.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1983-01-01

    Describes two new bibliographic information services being marketed to microcomputer owners by DIALOG, Inc. and Bibliographic Retrieval Services to allow access to databases at low rates during evening hours. Subject focus, selection of a database, search strategies employed on each system are discussed, and the two services are compared. (EJS)

  11. A Practical Introduction to Non-Bibliographic Database Searching.

    ERIC Educational Resources Information Center

    Rocke, Hans J.; And Others

    This guide comprises four reports on the Laboratory Animal Data Bank (LADB), the National Institute of Health Environmental Protection Agency (NIH/EPA) Chemical Information System (CIS), nonbibliographic databases for the social sciences, and the Toxicology Data Bank (TDB) and Registry of Toxic Effects of Chemical Substances (RTECS). The first…

  12. Searching for Controlled Trials of Complementary and Alternative Medicine: A Comparison of 15 Databases

    PubMed Central

    Cogo, Elise; Sampson, Margaret; Ajiferuke, Isola; Manheimer, Eric; Campbell, Kaitryn; Daniel, Raymond; Moher, David

    2011-01-01

    This project aims to assess the utility of bibliographic databases beyond the three major ones (MEDLINE, EMBASE and Cochrane CENTRAL) for finding controlled trials of complementary and alternative medicine (CAM). Fifteen databases were searched to identify controlled clinical trials (CCTs) of CAM not also indexed in MEDLINE. Searches were conducted in May 2006 using the revised Cochrane highly sensitive search strategy (HSSS) and the PubMed CAM Subset. Yield of CAM trials per 100 records was determined, and databases were compared over a standardized period (2005). The Acudoc2 RCT, Acubriefs, Index to Chiropractic Literature (ICL) and Hom-Inform databases had the highest concentrations of non-MEDLINE records, with more than 100 non-MEDLINE records per 500. Other productive databases had ratios between 500 and 1500 records to 100 non-MEDLINE records—these were AMED, MANTIS, PsycINFO, CINAHL, Global Health and Alt HealthWatch. Five databases were found to be unproductive: AGRICOLA, CAIRSS, Datadiwan, Herb Research Foundation and IBIDS. Acudoc2 RCT yielded 100 CAM trials in the most recent 100 records screened. Acubriefs, AMED, Hom-Inform, MANTIS, PsycINFO and CINAHL had more than 25 CAM trials per 100 records screened. Global Health, ICL and Alt HealthWatch were below 25 in yield. There were 255 non-MEDLINE trials from eight databases in 2005, with only 10% indexed in more than one database. Yield varied greatly between databases; the most productive databases from both sampling methods were Acubriefs, Acudoc2 RCT, AMED and CINAHL. Low overlap between databases indicates comprehensive CAM literature searches will require multiple databases. PMID:19468052

  13. A search algorithm for quantum state engineering and metrology

    NASA Astrophysics Data System (ADS)

    Knott, P. A.

    2016-07-01

    In this paper we present a search algorithm that finds useful optical quantum states which can be created with current technology. We apply the algorithm to the field of quantum metrology with the goal of finding states that can measure a phase shift to a high precision. Our algorithm efficiently produces a number of novel solutions: we find experimentally ready schemes to produce states that show significant improvements over the state-of-the-art, and can measure with a precision that beats the shot noise limit by over a factor of 4. Furthermore, these states demonstrate a robustness to moderate/high photon losses, and we present a conceptually simple measurement scheme that saturates the Cramér-Rao bound.

  14. Extending the Role of the Corporate Library: Corporate Database Applications Using BRS/Search Software.

    ERIC Educational Resources Information Center

    Lammert, Diana

    1993-01-01

    Describes the McKenna Information Center's application of BRS/SEARCH, information retrieval software, as part of its services to Kennmetal Inc., its parent company. Features and uses of the software, including commands, custom searching, menu-driven interfaces, preparing reports, and designing databases are covered. Nine examples of software…

  15. A curriculum database with boolean natural-language searching in HyperCard.

    PubMed Central

    Mann, D.; Goodrum, K.; DeWine, J. M.; McVicker, J.

    1992-01-01

    A curriculum database including both natural-language and keyword searching was developed to assist faculty in curriculum research and reform. HyperCard (with extensions) on the Apple Macintosh provides a flexible single-user or networked environment for entering, indexing, searching and retrieving content in detailed faculty notes for the instructional activities in a four-year predoctoral curriculum. PMID:1482977

  16. InfoTrac's SearchBank Databases: Business Information and More.

    ERIC Educational Resources Information Center

    Mehta, Usha; Goodman, Beth

    1997-01-01

    Describes the InfoTrac SearchBank based on experiences at the University of Nevada, Reno, libraries where the service is available through the online catalog. Highlights include remote access through the Internet; indexing and abstracting; full-text access to 460 journal titles; a powerful search engine; and business-oriented databases.…

  17. When is a search not a search? A comparison of searching the AMED complementary health database via EBSCOhost, OVID and DIALOG.

    PubMed

    Younger, Paula; Boddy, Kate

    2009-06-01

    The researchers involved in this study work at Exeter Health library and at the Complementary Medicine Unit, Peninsula School of Medicine and Dentistry (PCMD). Within this collaborative environment it is possible to access the electronic resources of three institutions. This includes access to AMED and other databases using different interfaces. The aim of this study was to investigate whether searching different interfaces to the AMED allied health and complementary medicine database produced the same results when using identical search terms. The following Internet-based AMED interfaces were searched: DIALOG DataStar; EBSCOhost and OVID SP_UI01.00.02. Search results from all three databases were saved in an endnote database to facilitate analysis. A checklist was also compiled comparing interface features. In our initial search, DIALOG returned 29 hits, OVID 14 and Ebsco 8. If we assume that DIALOG returned 100% of potential hits, OVID initially returned only 48% of hits and EBSCOhost only 28%. In our search, a researcher using the Ebsco interface to carry out a simple search on AMED would miss over 70% of possible search hits. Subsequent EBSCOhost searches on different subjects failed to find between 21 and 86% of the hits retrieved using the same keywords via DIALOG DataStar. In two cases, the simple EBSCOhost search failed to find any of the results found via DIALOG DataStar. Depending on the interface, the number of hits retrieved from the same database with the same simple search can vary dramatically. Some simple searches fail to retrieve a substantial percentage of citations. This may result in an uninformed literature review, research funding application or treatment intervention. In addition to ensuring that keywords, spelling and medical subject headings (MeSH) accurately reflect the nature of the search, database users should include wildcards and truncation and adapt their search strategy substantially to retrieve the maximum number of appropriate

  18. New showers from parent body search across several video meteor databases

    NASA Astrophysics Data System (ADS)

    Šegon, Damir; Gural, Peter; Andreić, Željko; Skokić, Ivica; Korlević, Korado; Vida, Denis; Novoselnik, Filip

    2014-04-01

    This work was initiated by utilizing the latest complete set of both comets and NEOs downloaded from the JPL small-body database search engine. Rather than search for clustering within a single given database of meteor orbits, the method employed herein is to use all the known parent bodies with their individual orbital elements as the starting point, and find statistically significant associations across a variety of meteor databases cmns3-sonotaco,cmns3-cmnsdb. Fifteen new showers possibly related to a comet or a NEO were found.

  19. Conducting literature searches on Ayurveda in PubMed, Indian, and other databases.

    PubMed

    Narahari, Saravu R; Aggithaya, Madhur Guruprasad; Suraj, Kumbla R

    2010-11-01

    Literature searches for articles on Ayurveda provide special challenges, since many of the Indian journals in which such articles appear are not indexed by current medical databases such as PubMed and Cochrane Central Register of Controlled Trials. The aim of this study was to develop a comprehensive search strategy on Ayurveda topics and to map the existing databases containing Ayurveda journal publications. We have developed a literature search procedure that can recover the great majority of articles on any given topic associated with Ayurveda. Our system is formulated in an easily reproducible fashion that all researchers can use. Using the keywords related to Ayurveda and vitiligo, we searched 41 databases that may contain complementary and alternative medicine publications. Only 11 databases yielded results; PubMed contained 9 articles. Each of 14 other databases named in our search procedure averaged 23 articles. International Bibliographic Information of Dietary Supplements, for example, gave 22, of which 1 satisfied our eligibility criteria. "Annotated Bibliography of Indian Medicine" gave 47, of which 7 satisfied eligibility criteria. This article proposes guidelines enabling comprehensive searches to locate all types of Ayurvedic articles, not necessarily only randomized controlled trials.

  20. muBLASTP: database-indexed protein sequence search on multicore CPUs.

    PubMed

    Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun

    2016-11-04

    The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.

  1. Efficiency of 22 online databases in the search for physicochemical, toxicological and ecotoxicological information on chemicals.

    PubMed

    Guerbet, Michel; Guyodo, Gaetan

    2002-03-01

    The objective of this study was to evaluate the efficiency of 22 free online databases that could be used for an exhaustive search of physicochemical, toxicological and/or ecotoxicological information about various chemicals. Twenty-two databases with free access on the Internet were referenced. We then selected 27 major physicochemical, toxicological and ecotoxicological criteria and 14 compounds belonging to seven different chemical classes which were used to interrogate all the databases. Two indices were successively calculated to evaluate the efficiency with taking or not taking account of their specialization. More than 50% of the 22 databases 'knew' all of the 14 chemicals, but the quantity of information provided is very different from one to the other and most are poorly documented. Two categories clearly appear with specialized and non-specialized databases. The HSDB database is the most efficient general database to be searched first, because it is well documented for most of the 27 criteria. However, some specialized databases (i.e. EXTOXNET, SOLVEDB, etc.) must be searched secondarily to find additional information.

  2. Shape based indexing for faster search of RNA family databases.

    PubMed

    Janssen, Stefan; Reeder, Jens; Giegerich, Robert

    2008-02-29

    Most non-coding RNA families exert their function by means of a conserved, common secondary structure. The Rfam data base contains more than five hundred structurally annotated RNA families. Unfortunately, searching for new family members using covariance models (CMs) is very time consuming. Filtering approaches that use the sequence conservation to reduce the number of CM searches, are fast, but it is unknown to which sacrifice. We present a new filtering approach, which exploits the family specific secondary structure and significantly reduces the number of CM searches. The filter eliminates approximately 85% of the queries and discards only 2.6% true positives when evaluating Rfam against itself. First results also capture previously undetected non-coding RNAs in a recent human RNAz screen. The RNA shape index filter (RNAsifter) is based on the following rationale: An RNA family is characterised by structure, much more succinctly than by sequence content. Structures of individual family members, which naturally have different length and sequence composition, may exhibit structural variation in detail, but overall, they have a common shape in a more abstract sense. Given a fixed release of the Rfam data base, we can compute these abstract shapes for all families. This is called a shape index. If a query sequence belongs to a certain family, it must be able to fold into the family shape with reasonable free energy. Therefore, rather than matching the query against all families in the data base, we can first (and quickly) compute its feasible shape(s), and use the shape index to access only those families where a good match is possible due to a common shape with the query.

  3. Searching for quantum optimal controls under severe constraints

    SciTech Connect

    Riviello, Gregory; Tibbetts, Katharine Moore; Brif, Constantin; Long, Ruixing; Wu, Re-Bing; Ho, Tak-San; Rabitz, Herschel

    2015-04-06

    The success of quantum optimal control for both experimental and theoretical objectives is connected to the topology of the corresponding control landscapes, which are free from local traps if three conditions are met: (1) the quantum system is controllable, (2) the Jacobian of the map from the control field to the evolution operator is of full rank, and (3) there are no constraints on the control field. This paper investigates how the violation of assumption (3) affects gradient searches for globally optimal control fields. The satisfaction of assumptions (1) and (2) ensures that the control landscape lacks fundamental traps, but certain control constraints can still prevent successful optimization of the objective. Using optimal control simulations, we show that the most severe field constraints are those that limit essential control resources, such as the number of control variables, the control duration, and the field strength. Proper management of these resources is an issue of great practical importance for optimization in the laboratory. For each resource, we show that constraints exceeding quantifiable limits can introduce artificial traps to the control landscape and prevent gradient searches from reaching a globally optimal solution. These results demonstrate that careful choice of relevant control parameters helps to eliminate artificial traps and facilitate successful optimization.

  4. Searching for quantum optimal controls under severe constraints

    DOE PAGES

    Riviello, Gregory; Tibbetts, Katharine Moore; Brif, Constantin; ...

    2015-04-06

    The success of quantum optimal control for both experimental and theoretical objectives is connected to the topology of the corresponding control landscapes, which are free from local traps if three conditions are met: (1) the quantum system is controllable, (2) the Jacobian of the map from the control field to the evolution operator is of full rank, and (3) there are no constraints on the control field. This paper investigates how the violation of assumption (3) affects gradient searches for globally optimal control fields. The satisfaction of assumptions (1) and (2) ensures that the control landscape lacks fundamental traps, butmore » certain control constraints can still prevent successful optimization of the objective. Using optimal control simulations, we show that the most severe field constraints are those that limit essential control resources, such as the number of control variables, the control duration, and the field strength. Proper management of these resources is an issue of great practical importance for optimization in the laboratory. For each resource, we show that constraints exceeding quantifiable limits can introduce artificial traps to the control landscape and prevent gradient searches from reaching a globally optimal solution. These results demonstrate that careful choice of relevant control parameters helps to eliminate artificial traps and facilitate successful optimization.« less

  5. A formulation of a matrix sparsity approach for the quantum ordered search algorithm

    NASA Astrophysics Data System (ADS)

    Parmar, Jupinder; Rahman, Saarim; Thiara, Jaskaran

    One specific subset of quantum algorithms is Grovers Ordered Search Problem (OSP), the quantum counterpart of the classical binary search algorithm, which utilizes oracle functions to produce a specified value within an ordered database. Classically, the optimal algorithm is known to have a log2N complexity; however, Grovers algorithm has been found to have an optimal complexity between the lower bound of ((lnN-1)/π≈0.221log2N) and the upper bound of 0.433log2N. We sought to lower the known upper bound of the OSP. With Farhi et al. MITCTP 2815 (1999), arXiv:quant-ph/9901059], we see that the OSP can be resolved into a translational invariant algorithm to create quantum query algorithm restraints. With these restraints, one can find Laurent polynomials for various k — queries — and N — database sizes — thus finding larger recursive sets to solve the OSP and effectively reducing the upper bound. These polynomials are found to be convex functions, allowing one to make use of convex optimization to find an improvement on the known bounds. According to Childs et al. [Phys. Rev. A 75 (2007) 032335], semidefinite programming, a subset of convex optimization, can solve the particular problem represented by the constraints. We were able to implement a program abiding to their formulation of a semidefinite program (SDP), leading us to find that it takes an immense amount of storage and time to compute. To combat this setback, we then formulated an approach to improve results of the SDP using matrix sparsity. Through the development of this approach, along with an implementation of a rudimentary solver, we demonstrate how matrix sparsity reduces the amount of time and storage required to compute the SDP — overall ensuring further improvements will likely be made to reach the theorized lower bound.

  6. A searching and reporting system for relational databases using a graph-based metadata representation.

    PubMed

    Hewitt, Robin; Gobbi, Alberto; Lee, Man-Ling

    2005-01-01

    Relational databases are the current standard for storing and retrieving data in the pharmaceutical and biotech industries. However, retrieving data from a relational database requires specialized knowledge of the database schema and of the SQL query language. At Anadys, we have developed an easy-to-use system for searching and reporting data in a relational database to support our drug discovery project teams. This system is fast and flexible and allows users to access all data without having to write SQL queries. This paper presents the hierarchical, graph-based metadata representation and SQL-construction methods that, together, are the basis of this system's capabilities.

  7. No suitable precise or optimized epidemiologic search filters were available for bibliographic databases.

    PubMed

    Waffenschmidt, Siw; Hermanns, Tatjana; Gerber-Grote, Andreas; Mostardt, Sarah

    2017-02-01

    To determine a suitable approach to a systematic search for epidemiologic publications in bibliographic databases. For this purpose, suitable sensitive, precise, and optimized filters were to be selected for MEDLINE searches. In addition, the relevance of bibliographic databases was determined. Epidemiologic systematic reviews (SRs) retrieved in a systematic search and company dossiers were screened to identify epidemiologic publications (primary studies and SRs) published since 2007. These publications were used to generate a test and validation set. Furthermore, each SR's search strategy was reviewed, and epidemiologic filters were extracted. The search syntaxes were validated using the relative recall method. The test set comprises 729 relevant epidemiologic publications, of which 566 were MEDLINE-indexed. About 27 epidemiologic filters were extracted. One suitable sensitive filter was identified (Larney et al. 2013: 95.94% sensitivity). Precision was presumably underestimated so that no precise or optimized filters can be recommended. About 77.64% of the publications were found in MEDLINE. There is currently no suitable approach to conducting efficient systematic searches for epidemiologic publications in bibliographic databases. The filter by Larney et al. (2013) can be used for sensitive MEDLINE searches. No robust conclusions can be drawn on precise or optimized filters. Additional search approaches should be considered. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  8. Practical Quantum Private Database Queries Based on Passive Round-Robin Differential Phase-shift Quantum Key Distribution

    NASA Astrophysics Data System (ADS)

    Li, Jian; Yang, Yu-Guang; Chen, Xiu-Bo; Zhou, Yi-Hua; Shi, Wei-Min

    2016-08-01

    A novel quantum private database query protocol is proposed, based on passive round-robin differential phase-shift quantum key distribution. Compared with previous quantum private database query protocols, the present protocol has the following unique merits: (i) the user Alice can obtain one and only one key bit so that both the efficiency and security of the present protocol can be ensured, and (ii) it does not require to change the length difference of the two arms in a Mach-Zehnder interferometer and just chooses two pulses passively to interfere with so that it is much simpler and more practical. The present protocol is also proved to be secure in terms of the user security and database security.

  9. Practical Quantum Private Database Queries Based on Passive Round-Robin Differential Phase-shift Quantum Key Distribution

    PubMed Central

    Li, Jian; Yang, Yu-Guang; Chen, Xiu-Bo; Zhou, Yi-Hua; Shi, Wei-Min

    2016-01-01

    A novel quantum private database query protocol is proposed, based on passive round-robin differential phase-shift quantum key distribution. Compared with previous quantum private database query protocols, the present protocol has the following unique merits: (i) the user Alice can obtain one and only one key bit so that both the efficiency and security of the present protocol can be ensured, and (ii) it does not require to change the length difference of the two arms in a Mach-Zehnder interferometer and just chooses two pulses passively to interfere with so that it is much simpler and more practical. The present protocol is also proved to be secure in terms of the user security and database security. PMID:27539654

  10. Practical Quantum Private Database Queries Based on Passive Round-Robin Differential Phase-shift Quantum Key Distribution.

    PubMed

    Li, Jian; Yang, Yu-Guang; Chen, Xiu-Bo; Zhou, Yi-Hua; Shi, Wei-Min

    2016-08-19

    A novel quantum private database query protocol is proposed, based on passive round-robin differential phase-shift quantum key distribution. Compared with previous quantum private database query protocols, the present protocol has the following unique merits: (i) the user Alice can obtain one and only one key bit so that both the efficiency and security of the present protocol can be ensured, and (ii) it does not require to change the length difference of the two arms in a Mach-Zehnder interferometer and just chooses two pulses passively to interfere with so that it is much simpler and more practical. The present protocol is also proved to be secure in terms of the user security and database security.

  11. MIDAS: a database-searching algorithm for metabolite identification in metabolomics.

    PubMed

    Wang, Yingfeng; Kora, Guruprasad; Bowen, Benjamin P; Pan, Chongle

    2014-10-07

    A database searching approach can be used for metabolite identification in metabolomics by matching measured tandem mass spectra (MS/MS) against the predicted fragments of metabolites in a database. Here, we present the open-source MIDAS algorithm (Metabolite Identification via Database Searching). To evaluate a metabolite-spectrum match (MSM), MIDAS first enumerates possible fragments from a metabolite by systematic bond dissociation, then calculates the plausibility of the fragments based on their fragmentation pathways, and finally scores the MSM to assess how well the experimental MS/MS spectrum from collision-induced dissociation (CID) is explained by the metabolite's predicted CID MS/MS spectrum. MIDAS was designed to search high-resolution tandem mass spectra acquired on time-of-flight or Orbitrap mass spectrometer against a metabolite database in an automated and high-throughput manner. The accuracy of metabolite identification by MIDAS was benchmarked using four sets of standard tandem mass spectra from MassBank. On average, for 77% of original spectra and 84% of composite spectra, MIDAS correctly ranked the true compounds as the first MSMs out of all MetaCyc metabolites as decoys. MIDAS correctly identified 46% more original spectra and 59% more composite spectra at the first MSMs than an existing database-searching algorithm, MetFrag. MIDAS was showcased by searching a published real-world measurement of a metabolome from Synechococcus sp. PCC 7002 against the MetaCyc metabolite database. MIDAS identified many metabolites missed in the previous study. MIDAS identifications should be considered only as candidate metabolites, which need to be confirmed using standard compounds. To facilitate manual validation, MIDAS provides annotated spectra for MSMs and labels observed mass spectral peaks with predicted fragments. The database searching and manual validation can be performed online at http://midas.omicsbio.org.

  12. Global search tool for the Advanced Photon Source Integrated Relational Model of Installed Systems (IRMIS) database.

    SciTech Connect

    Quock, D. E. R.; Cianciarulo, M. B.; APS Engineering Support Division; Purdue Univ.

    2007-01-01

    The Integrated Relational Model of Installed Systems (IRMIS) is a relational database tool that has been implemented at the Advanced Photon Source to maintain an updated account of approximately 600 control system software applications, 400,000 process variables, and 30,000 control system hardware components. To effectively display this large amount of control system information to operators and engineers, IRMIS was initially built with nine Web-based viewers: Applications Organizing Index, IOC, PLC, Component Type, Installed Components, Network, Controls Spares, Process Variables, and Cables. However, since each viewer is designed to provide details from only one major category of the control system, the necessity for a one-stop global search tool for the entire database became apparent. The user requirements for extremely fast database search time and ease of navigation through search results led to the choice of Asynchronous JavaScript and XML (AJAX) technology in the implementation of the IRMIS global search tool. Unique features of the global search tool include a two-tier level of displayed search results, and a database data integrity validation and reporting mechanism.

  13. CLIP: similarity searching of 3D databases using clique detection.

    PubMed

    Rhodes, Nicholas; Willett, Peter; Calvet, Alain; Dunbar, James B; Humblet, Christine

    2003-01-01

    This paper describes a program for 3D similarity searching, called CLIP (for Candidate Ligand Identification Program), that uses the Bron-Kerbosch clique detection algorithm to find those structures in a file that have large structures in common with a target structure. Structures are characterized by the geometric arrangement of pharmacophore points and the similarity between two structures calculated using modifications of the Simpson and Tanimoto association coefficients. This modification takes into account the fact that a distance tolerance is required to ensure that pairs of interatomic distances can be regarded as equivalent during the clique-construction stage of the matching algorithm. Experiments with HIV assay data demonstrate the effectiveness and the efficiency of this approach to virtual screening.

  14. Analysis of Searches by End-Users of Science and Engineering CD-ROM Databases in an Academic Library.

    ERIC Educational Resources Information Center

    Culbertson, Michael

    1992-01-01

    This study analyzed CD-ROM searches in five science and engineering databases in an academic library. Results indicated that users were usually able to obtain results and print records but that few used more sophisticated techniques to refine their searches. It was concluded that instruction in CD-ROM database searching should be a high priority.…

  15. Database search for safety information on cosmetic ingredients.

    PubMed

    Pauwels, Marleen; Rogiers, Vera

    2007-12-01

    Ethical considerations with respect to experimental animal use and regulatory testing are worldwide under heavy discussion and are, in certain cases, taken up in legislative measures. The most explicit example is the European cosmetic legislation, establishing a testing ban on finished cosmetic products since 11 September 2004 and enforcing that the safety of a cosmetic product is assessed by taking into consideration "the general toxicological profile of the ingredients, their chemical structure and their level of exposure" (OJ L151, 32-37, 23 June 1993; OJ L066, 26-35, 11 March 2003). Therefore the availability of referenced and reliable information on cosmetic ingredients becomes a dire necessity. Given the high-speed progress of the World Wide Web services and the concurrent drastic increase in free access to information, identification of relevant data sources and evaluation of the scientific value and quality of the retrieved data, are crucial. Based upon own practical experience, a survey is put together of freely and commercially available data sources with their individual description, field of application, benefits and drawbacks. It should be mentioned that the search strategies described are equally useful as a starting point for any quest for safety data on chemicals or chemical-related substances in general.

  16. The LAILAPS search engine: a feature model for relevance ranking in life science databases.

    PubMed

    Lange, Matthias; Spies, Karl; Colmsee, Christian; Flemming, Steffen; Klapperstück, Matthias; Scholz, Uwe

    2010-03-25

    Efficient and effective information retrieval in life sciences is one of the most pressing challenge in bioinformatics. The incredible growth of life science databases to a vast network of interconnected information systems is to the same extent a big challenge and a great chance for life science research. The knowledge found in the Web, in particular in life-science databases, are a valuable major resource. In order to bring it to the scientist desktop, it is essential to have well performing search engines. Thereby, not the response time nor the number of results is important. The most crucial factor for millions of query results is the relevance ranking. In this paper, we present a feature model for relevance ranking in life science databases and its implementation in the LAILAPS search engine. Motivated by the observation of user behavior during their inspection of search engine result, we condensed a set of 9 relevance discriminating features. These features are intuitively used by scientists, who briefly screen database entries for potential relevance. The features are both sufficient to estimate the potential relevance, and efficiently quantifiable. The derivation of a relevance prediction function that computes the relevance from this features constitutes a regression problem. To solve this problem, we used artificial neural networks that have been trained with a reference set of relevant database entries for 19 protein queries. Supporting a flexible text index and a simple data import format, this concepts are implemented in the LAILAPS search engine. It can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. LAILAPS is publicly available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.

  17. Development and Validation of Search Filters to Identify Articles on Family Medicine in Online Medical Databases.

    PubMed

    Pols, David H J; Bramer, Wichor M; Bindels, Patrick J E; van de Laar, Floris A; Bohnen, Arthur M

    2015-01-01

    Physicians and researchers in the field of family medicine often need to find relevant articles in online medical databases for a variety of reasons. Because a search filter may help improve the efficiency and quality of such searches, we aimed to develop and validate search filters to identify research studies of relevance to family medicine. Using a new and objective method for search filter development, we developed and validated 2 search filters for family medicine. The sensitive filter had a sensitivity of 96.8% and a specificity of 74.9%. The specific filter had a specificity of 97.4% and a sensitivity of 90.3%. Our new filters should aid literature searches in the family medicine field. The sensitive filter may help researchers conducting systematic reviews, whereas the specific filter may help family physicians find answers to clinical questions at the point of care when time is limited. © 2015 Annals of Family Medicine, Inc.

  18. Development and Validation of Search Filters to Identify Articles on Family Medicine in Online Medical Databases

    PubMed Central

    Pols, David H.J.; Bramer, Wichor M.; Bindels, Patrick J.E.; van de Laar, Floris A.; Bohnen, Arthur M.

    2015-01-01

    Physicians and researchers in the field of family medicine often need to find relevant articles in online medical databases for a variety of reasons. Because a search filter may help improve the efficiency and quality of such searches, we aimed to develop and validate search filters to identify research studies of relevance to family medicine. Using a new and objective method for search filter development, we developed and validated 2 search filters for family medicine. The sensitive filter had a sensitivity of 96.8% and a specificity of 74.9%. The specific filter had a specificity of 97.4% and a sensitivity of 90.3%. Our new filters should aid literature searches in the family medicine field. The sensitive filter may help researchers conducting systematic reviews, whereas the specific filter may help family physicians find answers to clinical questions at the point of care when time is limited. PMID:26195683

  19. Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches.

    PubMed

    Yu, Yi-Kuo; Gertz, E Michael; Agarwala, Richa; Schäffer, Alejandro A; Altschul, Stephen F

    2006-01-01

    Protein sequence database search programs may be evaluated both for their retrieval accuracy--the ability to separate meaningful from chance similarities--and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.

  20. Using homology relations within a database markedly boosts protein sequence similarity search.

    PubMed

    Tong, Jing; Sadreyev, Ruslan I; Pei, Jimin; Kinch, Lisa N; Grishin, Nick V

    2015-06-02

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

  1. Using homology relations within a database markedly boosts protein sequence similarity search

    PubMed Central

    Tong, Jing; Sadreyev, Ruslan I.; Pei, Jimin; Kinch, Lisa N.; Grishin, Nick V.

    2015-01-01

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence–based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit’s known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre. PMID:26038555

  2. An alphabetic code based atomic level molecular similarity search in databases.

    PubMed

    Saranya, Nallusamy; Selvaraj, Samuel

    2012-01-01

    Atomic level molecular similarity and diversity studies have gained considerable importance through their wide application in Bioinformatics and Chemo-informatics for drug design. The availability of large volumes of data on chemical compounds requires new methodologies for efficient and effective searching of its archives in less time with optimal computational power. We describe an alphabetic algorithm for similarity searching based on atom-atom bonding preference for ligands. We represented 170 cyclindependent kinase 2 inhibitors using strings of pre-defined alphabets for searching using known protein sequence alignment tools. Thus, a common pattern was extracted using this set of compounds for database searching to retrieve similar active compounds. Area under the receiver operating characteristic (ROC) curve was used for the discrimination of similar and dissimilar compounds in the databases. An average retrieval rate of about 60% is obtained in cross-validation using the home-grown dataset and the directory of useful decoys (DUD, formally known as the ZINC database) data. This will help in the effective retrieval of similar compounds using database search.

  3. Quantum computers and unstructured search: finding and counting items with an arbitrarily entangled initial state

    NASA Astrophysics Data System (ADS)

    Carlini, A.; Hosoya, A.

    2001-02-01

    Grover's quantum algorithm for an unstructured search problem and the COUNT algorithm by Brassard et al. are generalized to the case when the initial state is arbitrarily and maximally entangled. This ansatz might be relevant with quantum subroutines, when the computational qubits and the environment are coupled, and in general when the control over the quantum system is partial.

  4. Vehicle-triggered video compression/decompression for fast and efficient searching in large video databases

    NASA Astrophysics Data System (ADS)

    Bulan, Orhan; Bernal, Edgar A.; Loce, Robert P.; Wu, Wencheng

    2013-03-01

    Video cameras are widely deployed along city streets, interstate highways, traffic lights, stop signs and toll booths by entities that perform traffic monitoring and law enforcement. The videos captured by these cameras are typically compressed and stored in large databases. Performing a rapid search for a specific vehicle within a large database of compressed videos is often required and can be a time-critical life or death situation. In this paper, we propose video compression and decompression algorithms that enable fast and efficient vehicle or, more generally, event searches in large video databases. The proposed algorithm selects reference frames (i.e., I-frames) based on a vehicle having been detected at a specified position within the scene being monitored while compressing a video sequence. A search for a specific vehicle in the compressed video stream is performed across the reference frames only, which does not require decompression of the full video sequence as in traditional search algorithms. Our experimental results on videos captured in a local road show that the proposed algorithm significantly reduces the search space (thus reducing time and computational resources) in vehicle search tasks within compressed video streams, particularly those captured in light traffic volume conditions.

  5. Searching via walking: How to find a marked clique of a complete graph using quantum walks

    SciTech Connect

    Hillery, Mark; Reitzner, Daniel; Buzek, Vladimir

    2010-06-15

    We show how a quantum walk can be used to find a marked edge or a marked complete subgraph of a complete graph. We employ a version of a quantum walk, the scattering walk, which lends itself to experimental implementation. The edges are marked by adding elements to them that impart a specific phase shift to the particle as it enters or leaves the edge. If the complete graph has N vertices and the subgraph has K vertices, the particle becomes localized on the subgraph in O(N/K) steps. This leads to a quantum search that is quadratically faster than a corresponding classical search. We show how to implement the quantum walk using a quantum circuit and a quantum oracle, which allows us to specify the resources needed for a quantitative comparison of the efficiency of classical and quantum searches--the number of oracle calls.

  6. Quantum search algorithm tailored to clause-satisfaction problems

    NASA Astrophysics Data System (ADS)

    Tulsi, Avatar

    2015-05-01

    Many important computer science problems can be reduced to the clause-satisfaction problem. We are given n Boolean variables xk and m clauses cj where each clause is a function of values of some xk. We want to find an assignment i of xk for which all m clauses are satisfied. Let fj(i ) be a binary function, which is 1 if the j th clause is satisfied by the assignment i , else fj(i ) =0 . Then the solution is r for which f (i =r )=1 , where f (i ) is the and function of all fj(i ) . In quantum computing, Grover's algorithm can be used to find r . A crucial component of this algorithm is the selective phase inversion Ir of the solution state encoding r . Ir is implemented by computing f (i ) for all i in superposition which requires computing and of all m binary functions fj(i ) . Hence there must be coupling between the computation circuits for each fj(i ) . In this paper, we present an alternative quantum search algorithm which relaxes the requirement of such couplings. Hence it offers implementation advantages for clause-satisfaction problems.

  7. Adiabatic Quantum Search Scheme With Atoms In a Cavity Driven by Lasers

    SciTech Connect

    Daems, D.; Guerin, S.

    2007-10-26

    We propose an implementation of the quantum search algorithm of a marked item in an unsorted list of N items by adiabatic passage in a cavity-laser-atom system. We use an ensemble of N identical three-level atoms trapped in a single-mode cavity and driven by two lasers. In each atom, the same level represents a database entry. One of the atoms is marked by having an energy gap between its two ground states. Appropriate time delays between the two laser pulses allow one to populate the marked state starting from an initial entangled state within a decoherence-free adiabatic subspace. The time to achieve such a process is shown to exhibit the {radical}(N) Grover speedup.

  8. A Bayesian network approach to the database search problem in criminal proceedings

    PubMed Central

    2012-01-01

    Background The ‘database search problem’, that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions

  9. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search

    PubMed Central

    Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile

    2015-01-01

    In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result–the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4–5 times faster than SSEARCH, 6–25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases PMID:26719890

  10. An Interactive Iterative Method for Electronic Searching of Large Literature Databases

    ERIC Educational Resources Information Center

    Hernandez, Marco A.

    2013-01-01

    PubMed® is an on-line literature database hosted by the U.S. National Library of Medicine. Containing over 21 million citations for biomedical literature--both abstracts and full text--in the areas of the life sciences, behavioral studies, chemistry, and bioengineering, PubMed® represents an important tool for researchers. PubMed® searches return…

  11. Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.

    2009-05-06

    Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptide identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample

  12. Building the Infrastructure of Resource Sharing: Union Catalogs, Distributed Search, and Cross-Database Linkage.

    ERIC Educational Resources Information Center

    Lynch, Clifford A.

    1997-01-01

    Union catalogs and distributed search systems are two ways users can locate materials in print and electronic formats. This article examines the advantages and limitations of both approaches and argues that they should be considered complementary rather than competitive. Discusses technologies creating linkage between catalogs and databases and…

  13. Sports Information Online: Searching the SPORT Database and Tips for Finding Sports Medicine Information Online.

    ERIC Educational Resources Information Center

    Janke, Richard V.; And Others

    1988-01-01

    The first article describes SPORT, a database providing international coverage of athletics and physical education, and compares it to other online services in terms of coverage, thesauri, possible search strategies, and actual usage. The second article reviews available online information on sports medicine. (CLB)

  14. Sports Information Online: Searching the SPORT Database and Tips for Finding Sports Medicine Information Online.

    ERIC Educational Resources Information Center

    Janke, Richard V.; And Others

    1988-01-01

    The first article describes SPORT, a database providing international coverage of athletics and physical education, and compares it to other online services in terms of coverage, thesauri, possible search strategies, and actual usage. The second article reviews available online information on sports medicine. (CLB)

  15. Planning for End-User Database Searching: Drexel and the Mac: A User-Consistent Interface.

    ERIC Educational Resources Information Center

    LaBorie, Tim; Donnelly, Leslie

    Drexel University instituted a microcomputing program in 1984 which required all freshmen to own Apple Macintosh microcomputers. All students were taught database searching on the BRS (Bibliographic Retrieval Services) system as part of the freshman humanities curriculum, and the university library was chosen as the site to house continuing…

  16. Discovering More Chemical Concepts from 3D Chemical Information Searches of Crystal Structure Databases

    ERIC Educational Resources Information Center

    Rzepa, Henry S.

    2016-01-01

    Three new examples are presented illustrating three-dimensional chemical information searches of the Cambridge structure database (CSD) from which basic core concepts in organic and inorganic chemistry emerge. These include connecting the regiochemistry of aromatic electrophilic substitution with the geometrical properties of hydrogen bonding…

  17. Successful Keyword Searching: Initiating Research on Popular Topics Using Electronic Databases.

    ERIC Educational Resources Information Center

    MacDonald, Randall M.; MacDonald, Susan Priest

    Students are using electronic resources more than ever before to locate information for assignments. Without the proper search terms, results are incomplete, and students are frustrated. Using the keywords, key people, organizations, and Web sites provided in this book and compiled from the most commonly used databases, students will be able to…

  18. Toward a public analysis database for LHC new physics searches using M ADA NALYSIS 5

    NASA Astrophysics Data System (ADS)

    Dumont, B.; Fuks, B.; Kraml, S.; Bein, S.; Chalons, G.; Conte, E.; Kulkarni, S.; Sengupta, D.; Wymant, C.

    2015-02-01

    We present the implementation, in the MadAnalysis 5 framework, of several ATLAS and CMS searches for supersymmetry in data recorded during the first run of the LHC. We provide extensive details on the validation of our implementations and propose to create a public analysis database within this framework.

  19. An Interactive Iterative Method for Electronic Searching of Large Literature Databases

    ERIC Educational Resources Information Center

    Hernandez, Marco A.

    2013-01-01

    PubMed® is an on-line literature database hosted by the U.S. National Library of Medicine. Containing over 21 million citations for biomedical literature--both abstracts and full text--in the areas of the life sciences, behavioral studies, chemistry, and bioengineering, PubMed® represents an important tool for researchers. PubMed® searches return…

  20. Discovering More Chemical Concepts from 3D Chemical Information Searches of Crystal Structure Databases

    ERIC Educational Resources Information Center

    Rzepa, Henry S.

    2016-01-01

    Three new examples are presented illustrating three-dimensional chemical information searches of the Cambridge structure database (CSD) from which basic core concepts in organic and inorganic chemistry emerge. These include connecting the regiochemistry of aromatic electrophilic substitution with the geometrical properties of hydrogen bonding…

  1. Parallel database search and prime factorization with magnonic holographic memory devices

    NASA Astrophysics Data System (ADS)

    Khitun, Alexander

    2015-12-01

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  2. Parallel database search and prime factorization with magnonic holographic memory devices

    SciTech Connect

    Khitun, Alexander

    2015-12-28

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  3. Studying Gene Expression: Database Searches and Promoter Fusions to Investigate Transcriptional Regulation in Bacteria†

    PubMed Central

    Martinez-Vaz, Betsy M.; Makarevitch, Irina; Stensland, Shane

    2010-01-01

    A laboratory project was designed to illustrate how to search biological databases and utilize the information provided by these resources to investigate transcriptional regulation in Escherichia coli. The students searched several databases (NCBI Genomes, RegulonDB and EcoCyc) to learn about gene function, regulation, and the organization of transcriptional units. A fluorometer and GFP promoter fusions were used to obtain fluorescence data and measure changes in transcriptional activity. The class designed and performed experiments to investigate the regulation of genes necessary for biosynthesis of amino acids and how expression is affected by environmental signals and transcriptional regulators. Assessment data showed that this activity enhanced students’ knowledge of databases, reporter genes and transcriptional regulation. PMID:23653697

  4. Systematic Reviews and Meta-Analyses of Traditional Chinese Medicine Must Search Chinese Databases to Reduce Language Bias

    PubMed Central

    Wu, Xin-Yin; Tang, Jin-Ling; Mao, Chen; Yuan, Jin-Qiu; Qin, Ying; Chung, Vincent C. H.

    2013-01-01

    Systematic reviews (SRs) that fail to search non-English databases may miss relevant studies and cause selection bias. The bias may be particularly severe in SRs of traditional Chinese medicine (TCM) as most randomized controlled trials (RCT) in TCM are published and accessible only in Chinese. In this study we investigated how often Chinese databases were not searched in SRs of TCM, how many trials were missed, and whether a bias may occur if Chinese databases were not searched. We searched 5 databases in English and 3 in Chinese for RCTs of Chinese herbal medicine for coronary artery disease and found that 96.64% (115/119) eligible studies could be identified only from Chinese databases. In a random sample of 80 Cochrane reviews on TCM, we found that Chinese databases were only searched in 43 or 53.75%, in which almost all the included studies were identified from Chinese databases. We also compared SRs of the same topic and found that they may draw a different conclusion if Chinese databases were not searched. In conclusion, an overwhelmingly high percentage of eligible trials on TCM could only be identified in Chinese databases. Reviewers in TCM are suggested to search Chinese databases to reduce potential selection bias. PMID:24223063

  5. Systematic reviews and meta-analyses of traditional chinese medicine must search chinese databases to reduce language bias.

    PubMed

    Wu, Xin-Yin; Tang, Jin-Ling; Mao, Chen; Yuan, Jin-Qiu; Qin, Ying; Chung, Vincent C H

    2013-01-01

    Systematic reviews (SRs) that fail to search non-English databases may miss relevant studies and cause selection bias. The bias may be particularly severe in SRs of traditional Chinese medicine (TCM) as most randomized controlled trials (RCT) in TCM are published and accessible only in Chinese. In this study we investigated how often Chinese databases were not searched in SRs of TCM, how many trials were missed, and whether a bias may occur if Chinese databases were not searched. We searched 5 databases in English and 3 in Chinese for RCTs of Chinese herbal medicine for coronary artery disease and found that 96.64% (115/119) eligible studies could be identified only from Chinese databases. In a random sample of 80 Cochrane reviews on TCM, we found that Chinese databases were only searched in 43 or 53.75%, in which almost all the included studies were identified from Chinese databases. We also compared SRs of the same topic and found that they may draw a different conclusion if Chinese databases were not searched. In conclusion, an overwhelmingly high percentage of eligible trials on TCM could only be identified in Chinese databases. Reviewers in TCM are suggested to search Chinese databases to reduce potential selection bias.

  6. Searching fee and non-fee toxicology information resources: an overview of selected databases.

    PubMed

    Wright, L L

    2001-01-12

    Toxicology profiles organize information by broad subjects, the first of which affirms identity of the agent studied. Studies here show two non-fee databases (ChemFinder and ChemIDplus) verify the identity of compounds with high efficiency (63% and 73% respectively) with the fee-based Chemical Abstracts Registry file serving well to fill data gaps (100%). Continued searching proceeds using knowledge of structure, scope and content to select databases. Valuable sources for information are factual databases that collect data and facts in special subject areas organized in formats available for analysis or use. Some sources representative of factual files are RTECS, CCRIS, HSDB, GENE-TOX and IRIS. Numerous factual databases offer a wealth of reliable information; however, exhaustive searches probe information published in journal articles and/or technical reports with records residing in bibliographic databases such as BIOSIS, EMBASE, MEDLINE, TOXLINE and Web of Science. Listed with descriptions are numerous factual and bibliographic databases supplied by 11 producers. Given the multitude of options and resources, it is often necessary to seek service desk assistance. Questions were posed by telephone and e-mail to service desks at DIALOG, ISI, MEDLARS, Micromedex and STN International. Results of the survey are reported.

  7. Searching biomedical databases on complementary medicine: the use of controlled vocabulary among authors, indexers and investigators.

    PubMed

    Murphy, Linda S; Reinsch, Sibylle; Najm, Wadie I; Dickerson, Vivian M; Seffinger, Michael A; Adams, Alan; Mishra, Shiraz I

    2003-07-07

    The optimal retrieval of a literature search in biomedicine depends on the appropriate use of Medical Subject Headings (MeSH), descriptors and keywords among authors and indexers. We hypothesized that authors, investigators and indexers in four biomedical databases are not consistent in their use of terminology in Complementary and Alternative Medicine (CAM). Based on a research question addressing the validity of spinal palpation for the diagnosis of neuromuscular dysfunction, we developed four search concepts with their respective controlled vocabulary and key terms. We calculated the frequency of MeSH, descriptors, and keywords used by authors in titles and abstracts in comparison to standard practices in semantic and analytic indexing in MEDLINE, MANTIS, CINAHL, and Web of Science. Multiple searches resulted in the final selection of 38 relevant studies that were indexed at least in one of the four selected databases. Of the four search concepts, validity showed the greatest inconsistency in terminology among authors, indexers and investigators. The use of spinal terms showed the greatest consistency. Of the 22 neuromuscular dysfunction terms provided by the investigators, 11 were not contained in the controlled vocabulary and six were never used by authors or indexers. Most authors did not seem familiar with the controlled vocabulary for validity in the area of neuromuscular dysfunction. Recently, standard glossaries have been developed to assist in the research development of manual medicine. Searching biomedical databases for CAM is challenging due to inconsistent use of controlled vocabulary and indexing procedures in different databases. A standard terminology should be used by investigators in conducting their search strategies and authors when writing titles, abstracts and submitting keywords for publications.

  8. Searching biomedical databases on complementary medicine: the use of controlled vocabulary among authors, indexers and investigators

    PubMed Central

    Murphy, Linda S; Reinsch, Sibylle; Najm, Wadie I; Dickerson, Vivian M; Seffinger, Michael A; Adams, Alan; Mishra, Shiraz I

    2003-01-01

    Background The optimal retrieval of a literature search in biomedicine depends on the appropriate use of Medical Subject Headings (MeSH), descriptors and keywords among authors and indexers. We hypothesized that authors, investigators and indexers in four biomedical databases are not consistent in their use of terminology in Complementary and Alternative Medicine (CAM). Methods Based on a research question addressing the validity of spinal palpation for the diagnosis of neuromuscular dysfunction, we developed four search concepts with their respective controlled vocabulary and key terms. We calculated the frequency of MeSH, descriptors, and keywords used by authors in titles and abstracts in comparison to standard practices in semantic and analytic indexing in MEDLINE, MANTIS, CINAHL, and Web of Science. Results Multiple searches resulted in the final selection of 38 relevant studies that were indexed at least in one of the four selected databases. Of the four search concepts, validity showed the greatest inconsistency in terminology among authors, indexers and investigators. The use of spinal terms showed the greatest consistency. Of the 22 neuromuscular dysfunction terms provided by the investigators, 11 were not contained in the controlled vocabulary and six were never used by authors or indexers. Most authors did not seem familiar with the controlled vocabulary for validity in the area of neuromuscular dysfunction. Recently, standard glossaries have been developed to assist in the research development of manual medicine. Conclusions Searching biomedical databases for CAM is challenging due to inconsistent use of controlled vocabulary and indexing procedures in different databases. A standard terminology should be used by investigators in conducting their search strategies and authors when writing titles, abstracts and submitting keywords for publications. PMID:12846931

  9. Doubling the success of quantum walk search using internal-state measurements

    NASA Astrophysics Data System (ADS)

    Prūsis, Krišjānis; Vihrovs, Jevgēnijs; Wong, Thomas G.

    2016-11-01

    In typical discrete-time quantum walk algorithms, one measures the position of the walker while ignoring its internal spin/coin state. Rather than neglecting the information in this internal state, we show that additionally measuring it doubles the success probability of many quantum spatial search algorithms. For example, this allows Grover's unstructured search problem to be solved with certainty, rather than with probability 1/2 if only the walker's position is measured, so the additional measurement yields a search algorithm that is twice as fast as without it, on average. Thus the internal state of discrete-time quantum walks holds valuable information that can be utilized to improve algorithms. Furthermore, we determine conditions for which spatial search problems on regular graphs are amenable to this doubling of the success probability, and this involves diagrammatically analyzing search using degenerate perturbation theory and deriving a useful formula for how the quantum walk acts in its reduced subspace.

  10. MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines

    PubMed Central

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I.; Marcotte, Edward M.

    2011-01-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses. PMID:21488652

  11. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.

    PubMed

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I; Marcotte, Edward M

    2011-07-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.

  12. Searching via walking: How to find a marked clique of a complete graph using quantum walks

    NASA Astrophysics Data System (ADS)

    Hillery, Mark; Reitzner, Daniel; Bužek, Vladimír

    2010-06-01

    We show how a quantum walk can be used to find a marked edge or a marked complete subgraph of a complete graph. We employ a version of a quantum walk, the scattering walk, which lends itself to experimental implementation. The edges are marked by adding elements to them that impart a specific phase shift to the particle as it enters or leaves the edge. If the complete graph has N vertices and the subgraph has K vertices, the particle becomes localized on the subgraph in O(N/K) steps. This leads to a quantum search that is quadratically faster than a corresponding classical search. We show how to implement the quantum walk using a quantum circuit and a quantum oracle, which allows us to specify the resources needed for a quantitative comparison of the efficiency of classical and quantum searches—the number of oracle calls.

  13. Multimedia explorer: image database, image proxy-server and search-engine.

    PubMed Central

    Frankewitsch, T.; Prokosch, U.

    1999-01-01

    Multimedia plays a major role in medicine. Databases containing images, movies or other types of multimedia objects are increasing in number, especially on the WWW. However, no good retrieval mechanism or search engine currently exists to efficiently track down such multimedia sources in the vast of information provided by the WWW. Secondly, the tools for searching databases are usually not adapted to the properties of images. HTML pages do not allow complex searches. Therefore establishing a more comfortable retrieval involves the use of a higher programming level like JAVA. With this platform independent language it is possible to create extensions to commonly used web browsers. These applets offer a graphical user interface for high level navigation. We implemented a database using JAVA objects as the primary storage container which are then stored by a JAVA controlled ORACLE8 database. Navigation depends on a structured vocabulary enhanced by a semantic network. With this approach multimedia objects can be encapsulated within a logical module for quick data retrieval. PMID:10566463

  14. A Novel Concept for the Search and Retrieval of the Derwent Markush Resource Database.

    PubMed

    Barth, Andreas; Stengel, Thomas; Litterst, Edwin; Kraut, Hans; Matuszczyk, Henry; Ailer, Franz; Hajkowski, Steve

    2016-05-23

    The representation of and search for generic chemical structures (Markush) remains a continuing challenge. Several research groups have addressed this problem, and over time a limited number of practical solutions have been proposed. Today there are two large commercial providers of Markush databases: Chemical Abstracts Service (CAS) and Thomson Reuters. The Thomson Reuters "Derwent" Markush database is currently offered via the online services Questel and STN and as a data feed for in-house use. The aim of this paper is to briefly review the existing Markush systems (databases plus search engines) and to describe our new approach for the implementation of the Derwent Markush Resource on STN. Our new approach demonstrates the integration of the Derwent Markush Resource database into the existing chemistry-focused STN platform without loss of detail. This provides compatibility with other structure and Markush databases on STN and at the same time makes it possible to deploy the specific features and functions of the Derwent approach. It is shown that the different Markush languages developed by CAS and Derwent can be combined into a single general Markush description. In this concept the generic nodes are grouped together in a unique hierarchy where all chemical elements and fragments can be integrated. As a consequence, both systems are searchable using a single structure query. Moreover, the presented concept could serve as a promising starting point for a common generalized description of Markush structures.

  15. Improved Search of Principal Component Analysis Databases for Spectro-polarimetric Inversion

    NASA Astrophysics Data System (ADS)

    Casini, R.; Asensio Ramos, A.; Lites, B. W.; López Ariste, A.

    2013-08-01

    We describe a simple technique for the acceleration of spectro-polarimetric inversions based on principal component analysis (PCA) of Stokes profiles. This technique involves the indexing of the database models based on the sign of the projections (PCA coefficients) of the first few relevant orders of principal components of the four Stokes parameters. In this way, each model in the database can be attributed a distinctive binary number of 24n bits, where n is the number of PCA orders used for the indexing. Each of these binary numbers (indices) identifies a group of "compatible" models for the inversion of a given set of observed Stokes profiles sharing the same index. The complete set of the binary numbers so constructed evidently determines a partition of the database. The search of the database for the PCA inversion of spectro-polarimetric data can profit greatly from this indexing. In practical cases it becomes possible to approach the ideal acceleration factor of 24n as compared to the systematic search of a non-indexed database for a traditional PCA inversion. This indexing method relies on the existence of a physical meaning in the sign of the PCA coefficients of a model. For this reason, the presence of model ambiguities and of spectro-polarimetric noise in the observations limits in practice the number n of relevant PCA orders that can be used for the indexing.

  16. IMPROVED SEARCH OF PRINCIPAL COMPONENT ANALYSIS DATABASES FOR SPECTRO-POLARIMETRIC INVERSION

    SciTech Connect

    Casini, R.; Lites, B. W.; Ramos, A. Asensio

    2013-08-20

    We describe a simple technique for the acceleration of spectro-polarimetric inversions based on principal component analysis (PCA) of Stokes profiles. This technique involves the indexing of the database models based on the sign of the projections (PCA coefficients) of the first few relevant orders of principal components of the four Stokes parameters. In this way, each model in the database can be attributed a distinctive binary number of 2{sup 4n} bits, where n is the number of PCA orders used for the indexing. Each of these binary numbers (indices) identifies a group of ''compatible'' models for the inversion of a given set of observed Stokes profiles sharing the same index. The complete set of the binary numbers so constructed evidently determines a partition of the database. The search of the database for the PCA inversion of spectro-polarimetric data can profit greatly from this indexing. In practical cases it becomes possible to approach the ideal acceleration factor of 2{sup 4n} as compared to the systematic search of a non-indexed database for a traditional PCA inversion. This indexing method relies on the existence of a physical meaning in the sign of the PCA coefficients of a model. For this reason, the presence of model ambiguities and of spectro-polarimetric noise in the observations limits in practice the number n of relevant PCA orders that can be used for the indexing.

  17. Speeding up tandem mass spectrometry-based database searching by longest common prefix

    PubMed Central

    2010-01-01

    Background Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use. Results We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions. Conclusions The ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm PMID:21108792

  18. Effects of dissipation on an adiabatic quantum search algorithm

    NASA Astrophysics Data System (ADS)

    de Vega, Inés; Bañuls, Mari Carmen; Pérez, A.

    2010-12-01

    According to recent studies (Amin et al 2008 Phys. Rev. Lett. 100 060503), the effect of a thermal bath may improve the performance of a quantum adiabatic search algorithm. In this paper, we compare the effects of such a thermal environment on the algorithm performance with those of a structured environment similar to the one encountered in systems coupled to an electromagnetic field that exists within a photonic crystal. Whereas for all the parameter regimes explored here, the algorithm performance is worsened by contact with a thermal environment, the picture appears to be different when one considers a structured environment. In this case we show that by tuning the environment parameters to certain regimes, the algorithm performance can actually be improved with respect to the closed system case. Additionally, the relevance of considering the dissipation rates as complex quantities is discussed in both cases. More specifically, we find that the imaginary part of the rates cannot be neglected with the usual argument that it simply amounts to an energy shift and in fact influences crucially the system dynamics.

  19. General Quantum Meet-in-the-Middle Search Algorithm Based on Target Solution of Fixed Weight

    NASA Astrophysics Data System (ADS)

    Fu, Xiang-Qun; Bao, Wan-Su; Wang, Xiang; Shi, Jian-Hong

    2016-10-01

    Similar to the classical meet-in-the-middle algorithm, the storage and computation complexity are the key factors that decide the efficiency of the quantum meet-in-the-middle algorithm. Aiming at the target vector of fixed weight, based on the quantum meet-in-the-middle algorithm, the algorithm for searching all n-product vectors with the same weight is presented, whose complexity is better than the exhaustive search algorithm. And the algorithm can reduce the storage complexity of the quantum meet-in-the-middle search algorithm. Then based on the algorithm and the knapsack vector of the Chor-Rivest public-key crypto of fixed weight d, we present a general quantum meet-in-the-middle search algorithm based on the target solution of fixed weight, whose computational complexity is \\sumj = 0d {(O(\\sqrt {Cn - k + 1d - j }) + O(C_kj log C_k^j))} with Σd i =0 Ck i memory cost. And the optimal value of k is given. Compared to the quantum meet-in-the-middle search algorithm for knapsack problem and the quantum algorithm for searching a target solution of fixed weight, the computational complexity of the algorithm is lower. And its storage complexity is smaller than the quantum meet-in-the-middle-algorithm. Supported by the National Basic Research Program of China under Grant No. 2013CB338002 and the National Natural Science Foundation of China under Grant No. 61502526

  20. Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

    PubMed

    Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

  1. A checklist to assess database-hosting platforms for designing and running searches for systematic reviews.

    PubMed

    Bethel, Alison; Rogers, Morwenna

    2014-03-01

    Systematic reviews require literature searches that are precise, sensitive and often complex. Database-hosting platforms need to facilitate this type of searching in order to minimise errors and the risk of bias in the results. The main objective of the study was to create a generic checklist of criteria to assess the ability of host platforms to cope with complex searching, for example, for systematic reviews, and to test the checklist against three host platforms (EBSCOhost, OvidSP and ProQuest). The checklist was developed as usual review work was carried out and through discussion between the two authors. Attributes on the checklist were designated as 'desirable' or 'essential'. The authors tested the checklist independently against three host platforms and graded their performance from 1 (insufficient) to 3 (performs well). Fifty-five desirable or essential attributes were identified for the checklist. None of the platforms performed well for all of the attributes on the checklist. Not all database-hosting platforms are designed for complex searching. Librarians and other decision-makers who work in health research settings need to be aware of the different limitations of host platforms for complex searching when they are making purchasing decisions or training others. © 2014 The authors. Health Information and Libraries Journal © 2014 Health Libraries Group.

  2. Practical private database queries based on a quantum-key-distribution protocol

    SciTech Connect

    Jakobi, Markus; Simon, Christoph; Gisin, Nicolas; Bancal, Jean-Daniel; Branciard, Cyril; Walenta, Nino; Zbinden, Hugo

    2011-02-15

    Private queries allow a user, Alice, to learn an element of a database held by a provider, Bob, without revealing which element she is interested in, while limiting her information about the other elements. We propose to implement private queries based on a quantum-key-distribution protocol, with changes only in the classical postprocessing of the key. This approach makes our scheme both easy to implement and loss tolerant. While unconditionally secure private queries are known to be impossible, we argue that an interesting degree of security can be achieved by relying on fundamental physical principles instead of unverifiable security assumptions in order to protect both the user and the database. We think that the scope exists for such practical private queries to become another remarkable application of quantum information in the footsteps of quantum key distribution.

  3. Searching molecular structure databases with tandem mass spectra using CSI:FingerID

    PubMed Central

    Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian

    2015-01-01

    Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin. PMID:26392543

  4. Discovery of novel mesangial cell proliferation inhibitors using a three-dimensional database searching method.

    PubMed

    Kurogi, Y; Miyata, K; Okamura, T; Hashimoto, K; Tsutsumi, K; Nasu, M; Moriyasu, M

    2001-07-05

    A three-dimensional pharmacophore model of mesangial cell (MC) proliferation inhibitors was generated from a training set of 4-(diethoxyphosphoryl)methyl-N-(3-phenyl-[1,2,4]thiadiazol-5-yl)benzamide, 2, and its derivatives using the Catalyst/HIPHOP software program. On the basis of the in vitro MC proliferation inhibitory activity, a pharmacophore model was generated as seven features consisting of two hydrophobic regions, two hydrophobic aromatic regions, and three hydrogen bond acceptors. Using this model as a three-dimensional query to search the Maybridge database, structurally novel 41 compounds were identified. The evaluation of MC proliferation inhibitory activity using available samples from the 41 identified compounds exhibited over 50% inhibitory activity at the 100 nM range. Interestingly, the newly identified compounds by the 3D database searching method exhibited the reduced inhibition of normal proximal tubular epithelial cell proliferation compared to a training set of compounds.

  5. Tempest: Accelerated MS/MS Database Search Software for Heterogeneous Computing Platforms.

    PubMed

    Adamo, Mark E; Gerber, Scott A

    2016-09-07

    MS/MS database search algorithms derive a set of candidate peptide sequences from in silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU (central processing unit) generates peptide candidates that are asynchronously sent to a discrete GPU (graphics processing unit) to be scored against experimental spectra in parallel. The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. © 2016 by John Wiley & Sons, Inc.

  6. BioSCAN: a network sharable computational resource for searching biosequence databases.

    PubMed

    Singh, R K; Hoffman, D L; Tell, S G; White, C T

    1996-06-01

    We describe a network sharable, interactive computational tool for rapid and sensitive search and analysis of biomolecular sequence databases such as GenBank, GenPept, Protein Identification Resource, and SWISS-PROT. The resource is accessible via the World Wide Web using popular client software such as Mosaic and Netscape. The client software is freely available on a number of computing platforms including Macintosh, IBM-PC, and Unix workstations.

  7. Quantum search on the two-dimensional lattice using the staggered model with Hamiltonians

    NASA Astrophysics Data System (ADS)

    Portugal, R.; Fernandes, T. D.

    2017-04-01

    Quantum search on the two-dimensional lattice with one marked vertex and cyclic boundary conditions is an important problem in the context of quantum algorithms with an interesting unfolding. It avails to test the ability of quantum walk models to provide efficient algorithms from the theoretical side and means to implement quantum walks in laboratories from the practical side. In this paper, we rigorously prove that the recent-proposed staggered quantum walk model provides an efficient quantum search on the two-dimensional lattice, if the reflection operators associated with the graph tessellations are used as Hamiltonians, which is an important theoretical result for validating the staggered model with Hamiltonians. Numerical results show that on the two-dimensional lattice staggered models without Hamiltonians are not as efficient as the one described in this paper and are, in fact, as slow as classical random-walk-based algorithms.

  8. Improved classification of mass spectrometry database search results using newer machine learning approaches.

    PubMed

    Ulintz, Peter J; Zhu, Ji; Qin, Zhaohui S; Andrews, Philip C

    2006-03-01

    Manual analysis of mass spectrometry data is a current bottleneck in high throughput proteomics. In particular, the need to manually validate the results of mass spectrometry database searching algorithms can be prohibitively time-consuming. Development of software tools that attempt to quantify the confidence in the assignment of a protein or peptide identity to a mass spectrum is an area of active interest. We sought to extend work in this area by investigating the potential of recent machine learning algorithms to improve the accuracy of these approaches and as a flexible framework for accommodating new data features. Specifically we demonstrated the ability of boosting and random forest approaches to improve the discrimination of true hits from false positive identifications in the results of mass spectrometry database search engines compared with thresholding and other machine learning approaches. We accommodated additional attributes obtainable from database search results, including a factor addressing proton mobility. Performance was evaluated using publically available electrospray data and a new collection of MALDI data generated from purified human reference proteins.

  9. A hybrid approach for addressing ring flexibility in 3D database searching.

    PubMed

    Sadowski, J

    1997-01-01

    A hybrid approach for flexible 3D database searching is presented that addresses the problem of ring flexibility. It combines the explicit storage of up to 25 multiple conformations of rings, with up to eight atoms, generated by the 3D structure generator CORINA with the power of a torsional fitting technique implemented in the 3D database system UNITY. A comparison with the original UNITY approach, using a database with about 130,000 entries and five different pharmacophore queries, was performed. The hybrid approach scored, on an average, 10-20% more hits than the reference run. Moreover, specific problems with unrealistic hit geometries produced by the original approach can be excluded. In addition, the influence of the maximum number of ring conformations per molecule was investigated. An optimal number of 10 conformations per molecule is recommended.

  10. Fast multiresolution search algorithm for optimal retrieval in large multimedia databases

    NASA Astrophysics Data System (ADS)

    Song, Byung C.; Kim, Myung J.; Ra, Jong Beom

    1999-12-01

    Most of the content-based image retrieval systems require a distance computation for each candidate image in the database. As a brute-force approach, the exhaustive search can be employed for this computation. However, this exhaustive search is time-consuming and limits the usefulness of such systems. Thus, there is a growing demand for a fast algorithm which provides the same retrieval results as the exhaustive search. In this paper, we prose a fast search algorithm based on a multi-resolution data structure. The proposed algorithm computes the lower bound of distance at each level and compares it with the latest minimum distance, starting from the low-resolution level. Once it is larger than the latest minimum distance, we can exclude the candidates without calculating the full- resolution distance. By doing this, we can dramatically reduce the total computational complexity. It is noticeable that the proposed fast algorithm provides not only the same retrieval results as the exhaustive search, but also a faster searching ability than existing fast algorithms. For additional performance improvement, we can easily combine the proposed algorithm with existing tree-based algorithms. The algorithm can also be used for the fast matching of various features such as luminance histograms, edge images, and local binary partition textures.

  11. A grammar based methodology for structural motif finding in ncRNA database search.

    PubMed

    Quest, Daniel; Tapprich, William; Ali, Hesham

    2007-01-01

    In recent years, sequence database searching has been conducted through local alignment heuristics, pattern-matching, and comparison of short statistically significant patterns. While these approaches have unlocked many clues as to sequence relationships, they are limited in that they do not provide context-sensitive searching capabilities (e.g. considering pseudoknots, protein binding positions, and complementary base pairs). Stochastic grammars (hidden Markov models HMMs and stochastic context-free grammars SCFG) do allow for flexibility in terms of local context, but the context comes at the cost of increased computational complexity. In this paper we introduce a new grammar based method for searching for RNA motifs that exist within a conserved RNA structure. Our method constrains computational complexity by using a chain of topology elements. Through the use of a case study we present the algorithmic approach and benchmark our approach against traditional methods.

  12. Pharmacophore modeling and three-dimensional database searching for drug design using catalyst.

    PubMed

    Kurogi, Y; Güner, O F

    2001-07-01

    Perceiving a pharmacophore is the first essential step towards understanding the interaction between a receptor and a ligand. Once a pharmacophore is established, a beneficial use of it is 3D database searching to retrieve novel compounds that would match the pharmacophore, without necessarily duplicating the topological features of known active compounds (hence remain independent of existing patents). As the 3D searching technology has evolved over the years, it has been effectively used for lead optimization, combinatorial library focusing, as well as virtual high-throughput screening. Clearly established as one of the successful computational tools in rational drug design, we present in this review article a brief history of the evolution of this technology and detailed algorithms of Catalyst, the latest 3D searching software to be released. We also provide brief summary of published successes with this technology, including two recent patent applications.

  13. Remote Access MicroMeSH: Evaluation of a Microcomputer System for Searching the MEDLINE Database

    PubMed Central

    Lowe, Henry J.; Barnett, G. Octo; Scott, Jon; Mallon, Laurie; Blewett, Dyan Ryan

    1989-01-01

    Remote Access MicroMeSH (RAMM) is a powerful but easy to use microcomputer system for searching the MEDLINE database. RAMM incorporates MicroMeSH, a microcomputer implementation of the National Library of Medicine's (NLM) Medical Subject Headings (MeSH) vocabulary. RAMM facilitates the creation of highly specific MEDLINE search queries. Our goals in creating RAMM were to provide a system that could be used to search the medical literature and to teach the basic skills required to use MeSH and MEDLINE. During the past two years RAMM has been used by clinicians, library professionals, researchers and students at Harvard Medical School and at selected academic sites in the U.S. and Canada. In February of 1989 we began an effort to formally evaluate RAMM. This paper describes the preliminary results of that evaluation.

  14. Rapid identification of anonymous subjects in large criminal databases: problems and solutions in IAFIS III/FBI subject searches

    NASA Astrophysics Data System (ADS)

    Kutzleb, C. D.

    1997-02-01

    The high incidence of recidivism (repeat offenders) in the criminal population makes the use of the IAFIS III/FBI criminal database an important tool in law enforcement. The problems and solutions employed by IAFIS III/FBI criminal subject searches are discussed for the following topics: (1) subject search selectivity and reliability; (2) the difficulty and limitations of identifying subjects whose anonymity may be a prime objective; (3) database size, search workload, and search response time; (4) techniques and advantages of normalizing the variability in an individual's name and identifying features into identifiable and discrete categories; and (5) the use of database demographics to estimate the likelihood of a match between a search subject and database subjects.

  15. Certain integrable system on a space associated with a quantum search algorithm

    SciTech Connect

    Uwano, Y. Hino, H.; Ishiwatari, Y.

    2007-04-15

    On thinking up a Grover-type quantum search algorithm for an ordered tuple of multiqubit states, a gradient system associated with the negative von Neumann entropy is studied on the space of regular relative configurations of multiqubit states (SR{sup 2}CMQ). The SR{sup 2}CMQ emerges, through a geometric procedure, from the space of ordered tuples of multiqubit states for the quantum search. The aim of this paper is to give a brief report on the integrability of the gradient dynamical system together with quantum information geometry of the underlying space, SR{sup 2}CMQ, of that system.

  16. The effect of wild card designations and rare alleles in forensic DNA database searches.

    PubMed

    Tvedebrink, Torben; Bright, Jo-Anne; Buckleton, John S; Curran, James M; Morling, Niels

    2015-05-01

    Forensic DNA databases are powerful tools used for the identification of persons of interest in criminal investigations. Typically, they consist of two parts: (1) a database containing DNA profiles of known individuals and (2) a database of DNA profiles associated with crime scenes. The risk of adventitious or chance matches between crimes and innocent people increases as the number of profiles within a database grows and more data is shared between various forensic DNA databases, e.g. from different jurisdictions. The DNA profiles obtained from crime scenes are often partial because crime samples may be compromised in quantity or quality. When an individual's profile cannot be resolved from a DNA mixture, ambiguity is introduced. A wild card, F, may be used in place of an allele that has dropped out or when an ambiguous profile is resolved from a DNA mixture. Variant alleles that do not correspond to any marker in the allelic ladder or appear above or below the extent of the allelic ladder range are assigned the allele designation R for rare allele. R alleles are position specific with respect to the observed/unambiguous allele. The F and R designations are made when the exact genotype has not been determined. The F and R designation are treated as wild cards for searching, which results in increased chance of adventitious matches. We investigated the probability of adventitious matches given these two types of wild cards. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  17. CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units.

    PubMed

    Liu, Yongchao; Maskell, Douglas L; Schmidt, Bertil

    2009-05-06

    The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate sequence database searches using commonly available and inexpensive hardware. Our CUDASW++ implementation (benchmarked on a single-GPU NVIDIA GeForce GTX 280 graphics card and a dual-GPU GeForce GTX 295 graphics card) provides a significant performance improvement compared to other publicly available implementations, such as SWPS3, CBESW, SW-CUDA, and NCBI-BLAST. CUDASW++ supports query sequences of length up to 59K and for query sequences ranging in length from 144 to 5,478 in Swiss-Prot release 56.6, the single-GPU version achieves an average performance of 9.509 GCUPS with a lowest performance of 9.039 GCUPS and a highest performance of 9.660 GCUPS, and the dual-GPU version achieves an average performance of 14.484 GCUPS with a lowest performance of 10.660 GCUPS and a highest performance of 16.087 GCUPS. CUDASW++ is publicly available open-source software. It provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  18. Colil: a database and search service for citation contexts in the life sciences domain.

    PubMed

    Fujiwara, Toyofumi; Yamamoto, Yasunori

    2015-01-01

    To promote research activities in a particular research area, it is important to efficiently identify current research trends, advances, and issues in that area. Although review papers in the research area can suffice for this purpose in general, researchers are not necessarily able to obtain these papers from research aspects of their interests at the time they are required. Therefore, the utilization of the citation contexts of papers in a research area has been considered as another approach. However, there are few search services to retrieve citation contexts in the life sciences domain; furthermore, efficiently obtaining citation contexts is becoming difficult due to the large volume and rapid growth of life sciences papers. Here, we introduce the Colil (Comments on Literature in Literature) database to store citation contexts in the life sciences domain. By using the Resource Description Framework (RDF) and a newly compiled vocabulary, we built the Colil database and made it available through the SPARQL endpoint. In addition, we developed a web-based search service called Colil that searches for a cited paper in the Colil database and then returns a list of citation contexts for it along with papers relevant to it based on co-citations. The citation contexts in the Colil database were extracted from full-text papers of the PubMed Central Open Access Subset (PMC-OAS), which includes 545,147 papers indexed in PubMed. These papers are distributed across 3,171 journals and cite 5,136,741 unique papers that correspond to approximately 25 % of total PubMed entries. By utilizing Colil, researchers can easily refer to a set of citation contexts and relevant papers based on co-citations for a target paper. Colil helps researchers to comprehend life sciences papers in a research area more efficiently and makes their biological research more efficient.

  19. Quantum exhaustive key search with simplified-DES as a case study.

    PubMed

    Almazrooie, Mishal; Samsudin, Azman; Abdullah, Rosni; Mutter, Kussay N

    2016-01-01

    To evaluate the security of a symmetric cryptosystem against any quantum attack, the symmetric algorithm must be first implemented on a quantum platform. In this study, a quantum implementation of a classical block cipher is presented. A quantum circuit for a classical block cipher of a polynomial size of quantum gates is proposed. The entire work has been tested on a quantum mechanics simulator called libquantum. First, the functionality of the proposed quantum cipher is verified and the experimental results are compared with those of the original classical version. Then, quantum attacks are conducted by using Grover's algorithm to recover the secret key. The proposed quantum cipher is used as a black box for the quantum search. The quantum oracle is then queried over the produced ciphertext to mark the quantum state, which consists of plaintext and key qubits. The experimental results show that for a key of n-bit size and key space of N such that [Formula: see text], the key can be recovered in [Formula: see text] computational steps.

  20. The Use of Research Electronic Data Capture (REDCap) Software to Create a Database of Librarian-Mediated Literature Searches

    PubMed Central

    LYON, JENNIFER A.; GARCIA-MILIAN, ROLANDO; NORTON, HANNAH F.; TENNANT, MICHELE R.

    2015-01-01

    Expert-mediated literature searching, a keystone service in biomedical librarianship, would benefit significantly from regular methodical review. This paper describes the novel use of Research Electronic Data Capture (REDCap) software to create a database of literature searches conducted at a large academic health sciences library. An archive of paper search requests was entered into REDCap, and librarians now prospectively enter records for current searches. Having search data readily available allows librarians to reuse search strategies and track their workload. In aggregate, this data can help guide practice and determine priorities by identifying users’ needs, tracking librarian effort, and focusing librarians’ continuing education. PMID:25023012

  1. Dialysis Search Filters for PubMed, Ovid MEDLINE, and Embase Databases

    PubMed Central

    Iansavichus, Arthur V.; Haynes, R. Brian; Lee, Christopher W.C.; Wilczynski, Nancy L.; McKibbon, Ann; Shariff, Salimah Z.; Blake, Peter G.; Lindsay, Robert M.

    2012-01-01

    Summary Background and objectives Physicians frequently search bibliographic databases, such as MEDLINE via PubMed, for best evidence for patient care. The objective of this study was to develop and test search filters to help physicians efficiently retrieve literature related to dialysis (hemodialysis or peritoneal dialysis) from all other articles indexed in PubMed, Ovid MEDLINE, and Embase. Design, setting, participants, & measurements A diagnostic test assessment framework was used to develop and test robust dialysis filters. The reference standard was a manual review of the full texts of 22,992 articles from 39 journals to determine whether each article contained dialysis information. Next, 1,623,728 unique search filters were developed, and their ability to retrieve relevant articles was evaluated. Results The high-performance dialysis filters consisted of up to 65 search terms in combination. These terms included the words “dialy” (truncated), “uremic,” “catheters,” and “renal transplant wait list.” These filters reached peak sensitivities of 98.6% and specificities of 98.5%. The filters’ performance remained robust in an independent validation subset of articles. Conclusions These empirically derived and validated high-performance search filters should enable physicians to effectively retrieve dialysis information from PubMed, Ovid MEDLINE, and Embase. PMID:22917701

  2. Dialysis search filters for PubMed, Ovid MEDLINE, and Embase databases.

    PubMed

    Iansavichus, Arthur V; Haynes, R Brian; Lee, Christopher W C; Wilczynski, Nancy L; McKibbon, Ann; Shariff, Salimah Z; Blake, Peter G; Lindsay, Robert M; Garg, Amit X

    2012-10-01

    Physicians frequently search bibliographic databases, such as MEDLINE via PubMed, for best evidence for patient care. The objective of this study was to develop and test search filters to help physicians efficiently retrieve literature related to dialysis (hemodialysis or peritoneal dialysis) from all other articles indexed in PubMed, Ovid MEDLINE, and Embase. A diagnostic test assessment framework was used to develop and test robust dialysis filters. The reference standard was a manual review of the full texts of 22,992 articles from 39 journals to determine whether each article contained dialysis information. Next, 1,623,728 unique search filters were developed, and their ability to retrieve relevant articles was evaluated. The high-performance dialysis filters consisted of up to 65 search terms in combination. These terms included the words "dialy" (truncated), "uremic," "catheters," and "renal transplant wait list." These filters reached peak sensitivities of 98.6% and specificities of 98.5%. The filters' performance remained robust in an independent validation subset of articles. These empirically derived and validated high-performance search filters should enable physicians to effectively retrieve dialysis information from PubMed, Ovid MEDLINE, and Embase.

  3. Searching for patterns in remote sensing image databases using neural networks

    NASA Technical Reports Server (NTRS)

    Paola, Justin D.; Schowengerdt, Robert A.

    1995-01-01

    We have investigated a method, based on a successful neural network multispectral image classification system, of searching for single patterns in remote sensing databases. While defining the pattern to search for and the feature to be used for that search (spectral, spatial, temporal, etc.) is challenging, a more difficult task is selecting competing patterns to train against the desired pattern. Schemes for competing pattern selection, including random selection and human interpreted selection, are discussed in the context of an example detection of dense urban areas in Landsat Thematic Mapper imagery. When applying the search to multiple images, a simple normalization method can alleviate the problem of inconsistent image calibration. Another potential problem, that of highly compressed data, was found to have a minimal effect on the ability to detect the desired pattern. The neural network algorithm has been implemented using the PVM (Parallel Virtual Machine) library and nearly-optimal speedups have been obtained that help alleviate the long process of searching through imagery.

  4. Exploring Multidisciplinary Data Sets through Database Driven Search Capabilities and Map-Based Web Services

    NASA Astrophysics Data System (ADS)

    O'Hara, S.; Ferrini, V.; Arko, R.; Carbotte, S. M.; Leung, A.; Bonczkowski, J.; Goodwillie, A.; Ryan, W. B.; Melkonian, A. K.

    2008-12-01

    Relational databases containing geospatially referenced data enable the construction of robust data access pathways that can be customized to suit the needs of a diverse user community. Web-based search capabilities driven by radio buttons and pull-down menus can be generated on-the-fly leveraging the power of the relational database and providing specialists a means of discovering specific data and data sets. While these data access pathways are sufficient for many scientists, map-based data exploration can also be an effective means of data discovery and integration by allowing users to rapidly assess the spatial co- registration of several data types. We present a summary of data access tools currently provided by the Marine Geoscience Data System (www.marine-geo.org) that are intended to serve a diverse community of users and promote data integration. Basic search capabilities allow users to discover data based on data type, device type, geographic region, research program, expedition parameters, personnel and references. In addition, web services are used to create database driven map interfaces that provide live access to metadata and data files.

  5. EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities.

    PubMed

    Hsin, Kun-Yi; Morgan, Hugh P; Shave, Steven R; Hinton, Andrew C; Taylor, Paul; Walkinshaw, Malcolm D

    2011-01-01

    We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has been established providing a number of data-mining possibilities. For each compound a single 3D conformer is stored along with over 1600 calculated descriptor values (molecular properties). A very efficient method for unique compound recognition, especially for a large scale database, is demonstrated by making use of small subgroups of the descriptors. Many of the shape and distance descriptors are held as pre-calculated bit strings permitting fast and efficient similarity and pharmacophore searches which can be used to identify families of related compounds for biological testing. Two ligand searching applications are given to demonstrate how EDULISS can be used to extract families of molecules with selected structural and biophysical features.

  6. Data Analysis Provenance: Use Case for Exoplanet Search in CoRoT Database

    NASA Astrophysics Data System (ADS)

    de Souza, L.; Salete Marcon Gomes Vaz, M.; Emílio, M.; Ferreira da Rocha, J. C.; Janot Pacheco, E.; Carlos Boufleur, R.

    2012-09-01

    CoRoT (COnvection Rotation and Planetary Transits) is a mission led by the French national space agency CNES, in collaboration with Austria, Spain, Germany, Belgium and Brazil. The mission priority is dedicated to exoplanet search and stellar seismology. CoRoT light curves database became public after one year of their delivery to the CoRoT Co-Is, following the CoRoT data policy. The CoRoT archive contains thousands of light curves in FITS format. Several exoplanet search algorithms require detrend algorithms to remove both stellar and instrumental signal, improving the chance to detect a transit. Different detrend and transit detection algorithms can be applied to the same database. Tracking the origin of the information and how the data was derived in each level in the data analysis process is essential to allow sharing, reuse, reprocessing and further analysis. This work aims at applying a formalized and codified knowledge model by means of domain ontology. It allows to enrich the data analysis with semantic and standardization. It holds the provenance information in the database for a posteriori recovers by humans or software agents.

  7. EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities

    PubMed Central

    Hsin, Kun-Yi; Morgan, Hugh P.; Shave, Steven R.; Hinton, Andrew C.; Taylor, Paul; Walkinshaw, Malcolm D.

    2011-01-01

    We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has been established providing a number of data-mining possibilities. For each compound a single 3D conformer is stored along with over 1600 calculated descriptor values (molecular properties). A very efficient method for unique compound recognition, especially for a large scale database, is demonstrated by making use of small subgroups of the descriptors. Many of the shape and distance descriptors are held as pre-calculated bit strings permitting fast and efficient similarity and pharmacophore searches which can be used to identify families of related compounds for biological testing. Two ligand searching applications are given to demonstrate how EDULISS can be used to extract families of molecules with selected structural and biophysical features. PMID:21051336

  8. mirPub: a database for searching microRNA publications.

    PubMed

    Vergoulis, Thanasis; Kanellos, Ilias; Kostoulas, Nikos; Georgakilas, Georgios; Sellis, Timos; Hatzigeorgiou, Artemis; Dalamagas, Theodore

    2015-05-01

    Identifying, amongst millions of publications available in MEDLINE, those that are relevant to specific microRNAs (miRNAs) of interest based on keyword search faces major obstacles. References to miRNA names in the literature often deviate from standard nomenclature for various reasons, since even the official nomenclature evolves. For instance, a single miRNA name may identify two completely different molecules or two different names may refer to the same molecule. mirPub is a database with a powerful and intuitive interface, which facilitates searching for miRNA literature, addressing the aforementioned issues. To provide effective search services, mirPub applies text mining techniques on MEDLINE, integrates data from several curated databases and exploits data from its user community following a crowdsourcing approach. Other key features include an interactive visualization service that illustrates intuitively the evolution of miRNA data, tag clouds summarizing the relevance of publications to particular diseases, cell types or tissues and access to TarBase 6.0 data to oversee genes related to miRNA publications. mirPub is freely available at http://www.microrna.gr/mirpub/. vergoulis@imis.athena-innovation.gr or dalamag@imis.athena-innovation.gr Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  9. A neotropical Miocene pollen database employing image-based search and semantic modeling1

    PubMed Central

    Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W.; Jaramillo, Carlos; Shyu, Chi-Ren

    2014-01-01

    • Premise of the study: Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Methods: Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Results: Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Discussion: Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery. PMID:25202648

  10. Gapped Spectral Dictionaries and Their Applications for Database Searches of Tandem Mass Spectra*

    PubMed Central

    Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno; Pevzner, Pavel A.

    2011-01-01

    Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-GappedDictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches. PMID:21444829

  11. Gapped spectral dictionaries and their applications for database searches of tandem mass spectra.

    PubMed

    Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno; Pevzner, Pavel A

    2011-06-01

    Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-Gapped-Dictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-Gapped-Dictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches.

  12. Associative memory model for searching an image database by image snippet

    NASA Astrophysics Data System (ADS)

    Khan, Javed I.; Yun, David Y.

    1994-09-01

    This paper presents an associative memory called an multidimensional holographic associative computing (MHAC), which can be potentially used to perform feature based image database query using image snippet. MHAC has the unique capability to selectively focus on specific segments of a query frame during associative retrieval. As a result, this model can perform search on the basis of featural significance described by a subset of the snippet pixels. This capability is critical for visual query in image database because quite often the cognitive index features in the snippet are statistically weak. Unlike, the conventional artificial associative memories, MHAC uses a two level representation and incorporates additional meta-knowledge about the reliability status of segments of information it receives and forwards. In this paper we present the analysis of focus characteristics of MHAC.

  13. Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation.

    PubMed

    Rognes, Torbjørn

    2011-06-01

    The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.

  14. Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

    PubMed Central

    2011-01-01

    Background The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. Results A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Conclusions Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance. PMID:21631914

  15. rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

    PubMed Central

    Hahn, Lars; Leimeister, Chris-André; Morgenstern, Burkhard

    2016-01-01

    Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don’t-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/ PMID:27760124

  16. Average probability that a "cold hit" in a DNA database search results in an erroneous attribution.

    PubMed

    Song, Yun S; Patil, Anand; Murphy, Erin E; Slatkin, Montgomery

    2009-01-01

    We consider a hypothetical series of cases in which the DNA profile of a crime-scene sample is found to match a known profile in a DNA database (i.e., a "cold hit"), resulting in the identification of a suspect based only on genetic evidence. We show that the average probability that there is another person in the population whose profile matches the crime-scene sample but who is not in the database is approximately 2(N - d)p(A), where N is the number of individuals in the population, d is the number of profiles in the database, and p(A) is the average match probability (AMP) for the population. The AMP is estimated by computing the average of the probabilities that two individuals in the population have the same profile. We show further that if a priori each individual in the population is equally likely to have left the crime-scene sample, then the average probability that the database search attributes the crime-scene sample to a wrong person is (N - d)p(A).

  17. Allie: a database and a search service of abbreviations and long forms

    PubMed Central

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader’s expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/. PMID:21498548

  18. Allie: a database and a search service of abbreviations and long forms.

    PubMed

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader's expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/.

  19. SAM: String-based sequence search algorithm for mitochondrial DNA database queries

    PubMed Central

    Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther

    2011-01-01

    The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). PMID:21056022

  20. Computational methodologies for compound database searching that utilize experimental protein-ligand interaction information.

    PubMed

    Tan, Lu; Batista, Jose; Bajorath, Jürgen

    2010-09-01

    Ligand- and target structure-based methods are widely used in virtual screening, but there is currently no methodology available that fully integrates these different approaches. Herein, we provide an overview of various attempts that have been made to combine ligand- and structure-based computational screening methods. We then review different types of approaches that utilize protein-ligand interaction information for database screening and filtering. Interaction-based approaches make use of a variety of methodological concepts including pharmacophore modeling and direct or indirect encoding of protein-ligand interactions in fingerprint formats. These interaction-based methods have been successfully applied to tackle different tasks related to virtual screening including postprocessing of docking poses, prioritization of binding modes, selectivity analysis, or similarity searching. Furthermore, we discuss the recently developed interacting fragment approach that indirectly incorporates 3D interaction information into 2D similarity searching and bridges between ligand- and structure-based methods.

  1. Spatial Search by Quantum Walk is Optimal for Almost all Graphs

    NASA Astrophysics Data System (ADS)

    Chakraborty, Shantanav; Novo, Leonardo; Ambainis, Andris; Omar, Yasser

    2016-03-01

    The problem of finding a marked node in a graph can be solved by the spatial search algorithm based on continuous-time quantum walks (CTQW). However, this algorithm is known to run in optimal time only for a handful of graphs. In this work, we prove that for Erdös-Renyi random graphs, i.e., graphs of n vertices where each edge exists with probability p , search by CTQW is almost surely optimal as long as p ≥log3 /2(n )/n . Consequently, we show that quantum spatial search is in fact optimal for almost all graphs, meaning that the fraction of graphs of n vertices for which this optimality holds tends to one in the asymptotic limit. We obtain this result by proving that search is optimal on graphs where the ratio between the second largest and the largest eigenvalue is bounded by a constant smaller than 1. Finally, we show that we can extend our results on search to establish high fidelity quantum communication between two arbitrary nodes of a random network of interacting qubits, namely, to perform quantum state transfer, as well as entanglement generation. Our work shows that quantum information tasks typically designed for structured systems retain performance in very disordered structures.

  2. Spatial search by quantum walk is optimal for almost all graphs

    NASA Astrophysics Data System (ADS)

    Chakraborty, Shantanav; Novo, Leonardo; Ambainis, Andris; Omar, Yasser

    The problem of finding a marked node in a graph can be solved by the spatial search algorithm based on continuous-time quantum walks (CTQW). However, this algorithm is known to run in optimal time only for a handful of graphs. In this work, we prove that for Erdös-Renyi random graphs, i.e. graphs of n vertices where each edge exists with probability p, search by CTQW is almost surely optimal as long as p >=log 3 / 2 (n) / n . Consequently, we show that quantum spatial search is in fact optimal for almost all graphs, meaning that the fraction of graphs of n vertices for which this optimality holds tends to one in the asymptotic limit. We obtain this result by proving that search is optimal on graphs where the ratio between the second largest and the largest eigenvalue is bounded by a constant smaller than 1. Finally, we show that we can extend our results on search to establish high fidelity quantum communication between two arbitrary nodes of a random network of interacting qubits, namely to perform quantum state transfer, as well as entanglement generation. Our work shows that quantum information tasks typically designed for structured systems retain performance in very disordered structures.

  3. What value is the CINAHL database when searching for systematic reviews of qualitative studies?

    PubMed

    Wright, Kath; Golder, Su; Lewis-Light, Kate

    2015-06-26

    The Cumulative Index to Nursing and Allied Health Literature (CINAHL) is generally thought to be a good source to search when conducting a review of qualitative evidence. Case studies have suggested that using CINAHL could be essential for reviews of qualitative studies covering topics in the nursing field, but it is unclear whether this can be extended more generally to reviews of qualitative studies in other topic areas. We carried out a retrospective analysis of a sample of systematic reviews of qualitative studies to investigate CINAHL's potential contribution to identifying the evidence. In particular, we planned to identify the percentage of included studies available in CINAHL and the percentage of the included studies unique to the CINAHL database. After screening 58 qualitative systematic reviews identified from the Database of Abstracts of Reviews of Effects (DARE), we created a sample set of 43 reviews covering a range of topics including patient experience of both illnesses and interventions. For all 43 reviews (21 %) in our sample, we found that some of the included studies were available in CINAHL. For nine of these reviews, all the studies that had been included in the final synthesis were available in the CINAHL database, so it could have been possible to identify all the included studies using just this one database, while for an additional 21 reviews (49 %), 80 % or more of the included studies were available in CINAHL. Consequently, for a total of 30 reviews, or 70 % of our sample, 80 % or more of the studies could be identified using CINAHL alone. 11 reviews, where we were able to recheck all the databases used by the original review authors, had included a study that was uniquely identified from the CINAHL database. The median % of unique studies was 9.09%; while the range had a lowest value of 5.0% to the highest value of 33.0%. [corrected]. Assuming a rigorous search strategy was used and the records sought were accurately indexed, we could

  4. Addressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies

    PubMed Central

    2012-01-01

    Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five “incorrect” targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes

  5. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies.

    PubMed

    Blakeley, Paul; Overton, Ian M; Hubbard, Simon J

    2012-11-02

    Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes

  6. MS-GF+ makes progress towards a universal database search tool for proteomics.

    PubMed

    Kim, Sangtae; Pevzner, Pavel A

    2014-10-31

    Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but the software tools to analyse tandem mass spectra are lagging behind. We present a database search tool MS-GF+ that is sensitive (it identifies more peptides than most other database search tools) and universal (it works well for diverse types of spectra, different configurations of MS instruments and different experimental protocols). We benchmark MS-GF+ using diverse spectral data sets: (i) spectra of varying fragmentation methods; (ii) spectra of multiple enzyme digests; (iii) spectra of phosphorylated peptides; and (iv) spectra of peptides with unusual fragmentation propensities produced by a novel alpha-lytic protease. For all these data sets, MS-GF+ significantly increases the number of identified peptides compared with commonly used methods for peptide identifications. We emphasize that although MS-GF+ is not specifically designed for any particular experimental set-up, it improves on the performance of tools specifically designed for these applications (for example, specialized tools for phosphoproteomics).

  7. The search for quantum critical scaling in a classical system

    NASA Astrophysics Data System (ADS)

    Lamsal, Jagat; Gaddy, John; Petrovic, Marcus; Montfrooij, Wouter; Vojta, Thomas

    2009-04-01

    Order-disorder phase transitions in magnetic metals that occur at zero temperature have been studied in great detail. Theorists have advanced scenarios for these quantum critical systems in which the unusual response can be seen to evolve from a competition between ordering and disordering tendencies, driven by quantum fluctuations. Unfortunately, there is a potential disconnect between the real systems that are being studied experimentally, and the idealized systems that theoretical scenarios are based upon. Here we discuss how disorder introduces a change in morphology from a three-dimensional system to a collection of magnetic clusters, and we present neutron scattering data on a classical system, Li[Mn1.96Li0.04]O4, that show how magnetic clusters by themselves can lead to scaling laws that mimic those observed in quantum critical systems.

  8. Fine-grained Database Field Search Using Attribute-Based Encryption for E-Healthcare Clouds.

    PubMed

    Guo, Cheng; Zhuang, Ruhan; Jie, Yingmo; Ren, Yizhi; Wu, Ting; Choo, Kim-Kwang Raymond

    2016-11-01

    An effectively designed e-healthcare system can significantly enhance the quality of access and experience of healthcare users, including facilitating medical and healthcare providers in ensuring a smooth delivery of services. Ensuring the security of patients' electronic health records (EHRs) in the e-healthcare system is an active research area. EHRs may be outsourced to a third-party, such as a community healthcare cloud service provider for storage due to cost-saving measures. Generally, encrypting the EHRs when they are stored in the system (i.e. data-at-rest) or prior to outsourcing the data is used to ensure data confidentiality. Searchable encryption (SE) scheme is a promising technique that can ensure the protection of private information without compromising on performance. In this paper, we propose a novel framework for controlling access to EHRs stored in semi-trusted cloud servers (e.g. a private cloud or a community cloud). To achieve fine-grained access control for EHRs, we leverage the ciphertext-policy attribute-based encryption (CP-ABE) technique to encrypt tables published by hospitals, including patients' EHRs, and the table is stored in the database with the primary key being the patient's unique identity. Our framework can enable different users with different privileges to search on different database fields. Differ from previous attempts to secure outsourcing of data, we emphasize the control of the searches of the fields within the database. We demonstrate the utility of the scheme by evaluating the scheme using datasets from the University of California, Irvine.

  9. Real-Time Ligand Binding Pocket Database Search Using Local Surface Descriptors

    PubMed Central

    Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

    2010-01-01

    Due to the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of a particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two dimensional pseudo-Zernike moments or the 3D Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark study employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed. PMID:20455259

  10. Designing novel nicotinic agonists by searching a database of molecular shapes

    NASA Astrophysics Data System (ADS)

    Sheridan, Robert P.; Venkataraghavan, R.

    1987-10-01

    We introduce an approach by which novel ligands can be designed for a receptor if a pharmacophore geometry has been established and the receptor-bound conformations of other ligands are known. We use the shape-matching method of Kuntz et al. [J. Mol. Biol., 161 (1982) 269-288] to search a database of molecular shapes for those molecules which can fit inside the combined volume of the known ligands and which have interatomic distances compatible with the pharmacophore geometry. Some of these molecules are then modified by interactive modeling techniques to better match the chemical properties of the known ligands. Our shape database (about 5000 candidate molecules) is derived from a subset of the Cambridge Crystallographic Database [Allen et al., Acta Crystallogr., Sect. B,35 (1979) 2331-2339]. We show, as an example, how several novel designs for nicotinic agonists can be derived by this approach, given a pharmacophore model derived from known agonists [Sheridan et al., J. Med. Chem., 29 (1986) 889-906]. This report complements our previous report [DesJarlais et al., J. Med. Chem., in press], which introduced a similar method for designing ligands when the structure of the receptor is known.

  11. Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows.

    PubMed

    Verheggen, Kenneth; Raeder, Helge; Berven, Frode S; Martens, Lennart; Barsnes, Harald; Vaudel, Marc

    2017-09-13

    Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines. © 2017 Wiley Periodicals, Inc.

  12. Renormalization group for a continuous-time quantum search in finite dimensions

    NASA Astrophysics Data System (ADS)

    Li, Shanshan; Boettcher, Stefan

    2017-03-01

    We consider the quantum search problem with a continuous-time quantum walk for networks characterized by a finite spectral dimension ds of the network Laplacian. For general networks of fractal (integer or noninteger) dimension df, for which in general df≠ds , it suggests that it is ds that determines the computational complexity of the quantum search. Our results continue those of A. M. Childs and J. Goldstone [Phys. Rev. A 70, 022314 (2004), 10.1103/PhysRevA.70.022314] for lattices of integer dimension, where d =df=ds . Thus, we find for general fractals that the Grover limit of quantum search can be obtained whenever ds>4 . This complements the recent discussion of mean-field (i.e., ds→∞ ) networks by S. Chakraborty et al. [Phys. Rev. Lett. 116, 100501 (2016), 10.1103/PhysRevLett.116.100501] showing that for all those networks, spatial search by quantum walk is optimal.

  13. The Relationship between Searches Performed in Online Databases and the Number of Full-Text Articles Accessed: Measuring the Interaction between Database and E-Journal Collections

    ERIC Educational Resources Information Center

    Lamothe, Alain R.

    2011-01-01

    The purpose of this paper is to report the results of a quantitative analysis exploring the interaction and relationship between the online database and electronic journal collections at the J. N. Desmarais Library of Laurentian University. A very strong relationship exists between the number of searches and the size of the online database…

  14. The Relationship between Searches Performed in Online Databases and the Number of Full-Text Articles Accessed: Measuring the Interaction between Database and E-Journal Collections

    ERIC Educational Resources Information Center

    Lamothe, Alain R.

    2011-01-01

    The purpose of this paper is to report the results of a quantitative analysis exploring the interaction and relationship between the online database and electronic journal collections at the J. N. Desmarais Library of Laurentian University. A very strong relationship exists between the number of searches and the size of the online database…

  15. MHC-I ligand discovery using targeted database searches of mass spectrometry data: Implications for T cell immunotherapies.

    PubMed

    Murphy, John Patrick; Konda, Prathyusha; Kowalewski, Daniel J; Schuster, Heiko; Clements, Derek; Kim, Youra; Cohen, Alejandro Martin; Sharif, Tanveer; Nielsen, Morten; Stevanović, Stefan; Lee, Patrick W; Gujar, Shashi

    2017-02-28

    Class I major histocompatibility complex I (MHC-I)-bound peptide ligands dictate the activation and specificity of CD8+ T-cells, and thus are important for devising T cell immunotherapies. In recent times, advances in mass spectrometry (MS) have enabled the precise identification of these MHC-I peptides wherein MS spectra are compared against a reference proteome. Unfortunately, matching these spectra to reference proteome databases is hindered by inflated search spaces attributed to a lack of enzyme restriction in the searches, limiting the efficiency with which MHC ligands are discovered. Here, we offer a solution to this problem whereby we developed a targeted database search approach, and accompanying tool SpectMHC, that is based on a priori-predicted MHC-I peptides. We first validated the approach using mass spectrometry data from 2 different allotype-specific mouse antibodies for the C57BL/6 mouse background. We then developed allotype-specific HLA databases to search previously published MS datasets of human peripheral blood mononuclear cells (PBMCs). Using this targeted search strategy improved peptide identifications for both mouse and human ligandomes by greater than two-fold and is superior to traditional "no enzyme" searches of reference proteomes. Our novel targeted database search promises to uncover otherwise missed novel T cell epitopes of therapeutic potential.

  16. ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches.

    PubMed

    Rognes, T

    2001-04-01

    There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

  17. Internet Databases of the Properties, Enzymatic Reactions, and Metabolism of Small Molecules—Search Options and Applications in Food Science

    PubMed Central

    Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Bucholska, Justyna; Starowicz, Piotr; Czyrko, Emilia

    2016-01-01

    Internet databases of small molecules, their enzymatic reactions, and metabolism have emerged as useful tools in food science. Database searching is also introduced as part of chemistry or enzymology courses for food technology students. Such resources support the search for information about single compounds and facilitate the introduction of secondary analyses of large datasets. Information can be retrieved from databases by searching for the compound name or structure, annotating with the help of chemical codes or drawn using molecule editing software. Data mining options may be enhanced by navigating through a network of links and cross-links between databases. Exemplary databases reviewed in this article belong to two classes: tools concerning small molecules (including general and specialized databases annotating food components) and tools annotating enzymes and metabolism. Some problems associated with database application are also discussed. Data summarized in computer databases may be used for calculation of daily intake of bioactive compounds, prediction of metabolism of food components, and their biological activity as well as for prediction of interactions between food component and drugs. PMID:27929431

  18. Internet Databases of the Properties, Enzymatic Reactions, and Metabolism of Small Molecules-Search Options and Applications in Food Science.

    PubMed

    Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Bucholska, Justyna; Starowicz, Piotr; Czyrko, Emilia

    2016-12-06

    Internet databases of small molecules, their enzymatic reactions, and metabolism have emerged as useful tools in food science. Database searching is also introduced as part of chemistry or enzymology courses for food technology students. Such resources support the search for information about single compounds and facilitate the introduction of secondary analyses of large datasets. Information can be retrieved from databases by searching for the compound name or structure, annotating with the help of chemical codes or drawn using molecule editing software. Data mining options may be enhanced by navigating through a network of links and cross-links between databases. Exemplary databases reviewed in this article belong to two classes: tools concerning small molecules (including general and specialized databases annotating food components) and tools annotating enzymes and metabolism. Some problems associated with database application are also discussed. Data summarized in computer databases may be used for calculation of daily intake of bioactive compounds, prediction of metabolism of food components, and their biological activity as well as for prediction of interactions between food component and drugs.

  19. Q-Learning-Based Adjustable Fixed-Phase Quantum Grover Search Algorithm

    NASA Astrophysics Data System (ADS)

    Guo, Ying; Shi, Wensha; Wang, Yijun; Hu, Jiankun

    2017-02-01

    We demonstrate that the rotation phase can be suitably chosen to increase the efficiency of the phase-based quantum search algorithm, leading to a dynamic balance between iterations and success probabilities of the fixed-phase quantum Grover search algorithm with Q-learning for a given number of solutions. In this search algorithm, the proposed Q-learning algorithm, which is a model-free reinforcement learning strategy in essence, is used for performing a matching algorithm based on the fraction of marked items λ and the rotation phase α. After establishing the policy function α = π(λ), we complete the fixed-phase Grover algorithm, where the phase parameter is selected via the learned policy. Simulation results show that the Q-learning-based Grover search algorithm (QLGA) enables fewer iterations and gives birth to higher success probabilities. Compared with the conventional Grover algorithms, it avoids the optimal local situations, thereby enabling success probabilities to approach one.

  20. Seismic Search Engine: A distributed database for mining large scale seismic data

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Vaidya, S.; Kuzma, H. A.

    2009-12-01

    The International Monitoring System (IMS) of the CTBTO collects terabytes worth of seismic measurements from many receiver stations situated around the earth with the goal of detecting underground nuclear testing events and distinguishing them from other benign, but more common events such as earthquakes and mine blasts. The International Data Center (IDC) processes and analyzes these measurements, as they are collected by the IMS, to summarize event detections in daily bulletins. Thereafter, the data measurements are archived into a large format database. Our proposed Seismic Search Engine (SSE) will facilitate a framework for data exploration of the seismic database as well as the development of seismic data mining algorithms. Analogous to GenBank, the annotated genetic sequence database maintained by NIH, through SSE, we intend to provide public access to seismic data and a set of processing and analysis tools, along with community-generated annotations and statistical models to help interpret the data. SSE will implement queries as user-defined functions composed from standard tools and models. Each query is compiled and executed over the database internally before reporting results back to the user. Since queries are expressed with standard tools and models, users can easily reproduce published results within this framework for peer-review and making metric comparisons. As an illustration, an example query is “what are the best receiver stations in East Asia for detecting events in the Middle East?” Evaluating this query involves listing all receiver stations in East Asia, characterizing known seismic events in that region, and constructing a profile for each receiver station to determine how effective its measurements are at predicting each event. The results of this query can be used to help prioritize how data is collected, identify defective instruments, and guide future sensor placements.

  1. The Interactive Online SKY/M-FISH & CGH Database and the Entrez Cancer Chromosomes Search Database: Linkage of Chromosomal Aberrations with the Genome Sequence

    PubMed Central

    Knutsen, Turid; Gobu, Vasuki; Knaus, Rodger; Padilla-Nash, Hesed; Augustus, Meena; Strausberg, Robert L.; Kirsch, Ilan R.; Sirotkin, Karl; Ried, Thomas

    2005-01-01

    To catalogue data on chromosomal aberrations in cancer derived from emerging molecular cytogenetic techniques and to integrate these data with genome maps, we have established two resources, the NCI and NCBI SKY/M-FISH & CGH Database, and the Cancer Chromosomes database. The goal of the former is to allow investigators to submit and analyze clinical and research cytogenetic data. It contains a karyotype parser tool, which automatically converts the ISCN short-form karyotype into an internal representation displayed in detailed form and as a colored ideogram with band overlay, and also contains a tool to compare CGH profiles from multiple cases. The Cancer Chromosomes database integrates the SKY/M-FISH & CGH Database with the Mitelman Database of Chromosome Aberrations in Cancer, and the Recurrent Chromosome Aberrations in Cancer database. These three datasets can now be searched seamlessly by use of the Entrez search and retrieval system for chromosome aberrations, clinical data, and reference citations. Common diagnoses, anatomic sites, chromosome breakpoints, junctions, numerical and structural abnormalities, and bands gained and lost among selected cases can be compared by use of the “similarity” report. Because the model used for CGH data is a subset of the karyotype data, it is now possible to examine the similarities between CGH results and karyotypes directly. All chromosomal bands are directly linked to the Entrez Map Viewer database, providing integration of cytogenetic data with the sequence assembly. These resources, developed as a part of the Cancer Chromosome Aberration Project (CCAP) initiative, aid the search for new cancer-associated genes and foster insights into the causes and consequences of genetic alterations in cancer. PMID:15934046

  2. SCANPS: a web server for iterative protein sequence database searching by dynamic programing, with display in a hierarchical SCOP browser.

    PubMed

    Walsh, Thomas P; Webber, Caleb; Searle, Stephen; Sturrock, Shane S; Barton, Geoffrey J

    2008-07-01

    SCANPS performs iterative profile searching similar to PSI-BLAST but with full dynamic programing on each cycle and on-the-fly estimation of significance. This combination gives good sensitivity and selectivity that outperforms PSI-BLAST in domain-searching benchmarks. Although computationally expensive, SCANPS exploits onchip parallelism (MMX and SSE2 instructions on Intel chips) as well as MPI parallelism to give acceptable turnround times even for large databases. A web server developed to run SCANPS searches is now available at http://www.compbio.dundee.ac.uk/www-scanps. The server interface allows a range of different protein sequence databases to be searched including the SCOP database of protein domains. The server provides the user with regularly updated versions of the main protein sequence databases and is backed up by significant computing resources which ensure that searches are performed rapidly. For SCOP searches, the results may be viewed in a new tree-based representation that reflects the structure of the SCOP hierarchy; this aids the user in placing each hit in the context of its SCOP classification and understanding its relationship to other domains in SCOP.

  3. Decision making in family medicine: randomized trial of the effects of the InfoClinique and Trip database search engines.

    PubMed

    Labrecque, Michel; Ratté, Stéphane; Frémont, Pierre; Cauchon, Michel; Ouellet, Jérôme; Hogg, William; McGowan, Jessie; Gagnon, Marie-Pierre; Njoya, Merlin; Légaré, France

    2013-10-01

    To compare the ability of users of 2 medical search engines, InfoClinique and the Trip database, to provide correct answers to clinical questions and to explore the perceived effects of the tools on the clinical decision-making process. Randomized trial. Three family medicine units of the family medicine program of the Faculty of Medicine at Laval University in Quebec city, Que. Fifteen second-year family medicine residents. Residents generated 30 structured questions about therapy or preventive treatment (2 questions per resident) based on clinical encounters. Using an Internet platform designed for the trial, each resident answered 20 of these questions (their own 2, plus 18 of the questions formulated by other residents, selected randomly) before and after searching for information with 1 of the 2 search engines. For each question, 5 residents were randomly assigned to begin their search with InfoClinique and 5 with the Trip database. The ability of residents to provide correct answers to clinical questions using the search engines, as determined by third-party evaluation. After answering each question, participants completed a questionnaire to assess their perception of the engine's effect on the decision-making process in clinical practice. Of 300 possible pairs of answers (1 answer before and 1 after the initial search), 254 (85%) were produced by 14 residents. Of these, 132 (52%) and 122 (48%) pairs of answers concerned questions that had been assigned an initial search with InfoClinique and the Trip database, respectively. Both engines produced an important and similar absolute increase in the proportion of correct answers after searching (26% to 62% for InfoClinique, for an increase of 36%; 24% to 63% for the Trip database, for an increase of 39%; P = .68). For all 30 clinical questions, at least 1 resident produced the correct answer after searching with either search engine. The mean (SD) time of the initial search for each question was 23.5 (7

  4. Integration of first-principles methods and crystallographic database searches for new ferroelectrics: Strategies and explorations

    NASA Astrophysics Data System (ADS)

    Bennett, Joseph W.; Rabe, Karin M.

    2012-11-01

    In this concept paper, the development of strategies for the integration of first-principles methods with crystallographic database mining for the discovery and design of novel ferroelectric materials is discussed, drawing on the results and experience derived from exploratory investigations on three different systems: (1) the double perovskite Sr(Sb1/2Mn1/2)O3 as a candidate semiconducting ferroelectric; (2) polar derivatives of schafarzikite MSb2O4; and (3) ferroelectric semiconductors with formula M2P2(S,Se)6. A variety of avenues for further research and investigation are suggested, including automated structure type classification, low-symmetry improper ferroelectrics, and high-throughput first-principles searches for additional representatives of structural families with desirable functional properties.

  5. Performing private database queries in a real-world environment using a quantum protocol

    PubMed Central

    Chan, Philip; Lucio-Martinez, Itzel; Mo, Xiaofan; Simon, Christoph; Tittel, Wolfgang

    2014-01-01

    In the well-studied cryptographic primitive 1-out-of-N oblivious transfer, a user retrieves a single element from a database of size N without the database learning which element was retrieved. While it has previously been shown that a secure implementation of 1-out-of-N oblivious transfer is impossible against arbitrarily powerful adversaries, recent research has revealed an interesting class of private query protocols based on quantum mechanics in a cheat sensitive model. Specifically, a practical protocol does not need to guarantee that the database provider cannot learn what element was retrieved if doing so carries the risk of detection. The latter is sufficient motivation to keep a database provider honest. However, none of the previously proposed protocols could cope with noisy channels. Here we present a fault-tolerant private query protocol, in which the novel error correction procedure is integral to the security of the protocol. Furthermore, we present a proof-of-concept demonstration of the protocol over a deployed fibre. PMID:24913129

  6. Performing private database queries in a real-world environment using a quantum protocol

    NASA Astrophysics Data System (ADS)

    Chan, Philip; Lucio-Martinez, Itzel; Mo, Xiaofan; Simon, Christoph; Tittel, Wolfgang

    2014-06-01

    In the well-studied cryptographic primitive 1-out-of-N oblivious transfer, a user retrieves a single element from a database of size N without the database learning which element was retrieved. While it has previously been shown that a secure implementation of 1-out-of-N oblivious transfer is impossible against arbitrarily powerful adversaries, recent research has revealed an interesting class of private query protocols based on quantum mechanics in a cheat sensitive model. Specifically, a practical protocol does not need to guarantee that the database provider cannot learn what element was retrieved if doing so carries the risk of detection. The latter is sufficient motivation to keep a database provider honest. However, none of the previously proposed protocols could cope with noisy channels. Here we present a fault-tolerant private query protocol, in which the novel error correction procedure is integral to the security of the protocol. Furthermore, we present a proof-of-concept demonstration of the protocol over a deployed fibre.

  7. Preoperative predictors of blood loss at the time of radical prostatectomy: results from the SEARCH database.

    PubMed

    Lloyd, J C; Bañez, L L; Aronson, W J; Terris, M K; Presti, J C; Amling, C L; Kane, C J; Freedland, S J

    2009-01-01

    The literature contains conflicting data on preoperative predictors of estimated blood loss (EBL) at radical retropubic prostatectomy (RRP). We sought to examine preoperative predictors of EBL at the time of RRP among patients from the SEARCH database to lend clarity to this issue. A total of 1154 patients were identified in the SEARCH database who underwent RRP between 1988 and 2008 and had EBL data available. We examined multiple preoperative factors for their ability to predict EBL using multivariate linear regression analysis. Median EBL was 900 ml (s.d. 1032). The 25th and 75th percentile for EBL were 600 and 1500 ml, respectively. EBL increased significantly with increasing body mass index (BMI) and increasing prostate size and decreased with more recent year of RRP (all P<0.001). The mean-adjusted EBL in normal-weight men (BMI<25 kg/m(2)) was 807 ml compared to 1067 ml among severely obese men (BM I>or=35 kg/m(2)). Predicted EBL for men with the smallest prostates (<20 g) was 721 ml, compared to 1326 ml for men with prostates >or=100 g. Finally, statistically significant differences between centers were observed, with mean-adjusted EBL ranging from 844 to 1094 ml. Both BMI and prostate size are predictors of increased EBL. Prostate size is of particular note, as a nearly twofold increased EBL was seen from the smallest (<20 g) to the largest prostates (>or=100 g). Over time, average EBL significantly decreased. Finally, significant differences in EBL were observed between centers. Patients with multiple risk factors should be forewarned they are at increased risk for higher EBL, which may translate into a greater need for blood transfusion.

  8. The Open Spectral Database: an open platform for sharing and searching spectral data.

    PubMed

    Chalk, Stuart J

    2016-01-01

    A number of websites make available spectral data for download (typically as JCAMP-DX text files) and one (ChemSpider) that also allows users to contribute spectral files. As a result, searching and retrieving such spectral data can be time consuming, and difficult to reuse if the data is compressed in the JCAMP-DX file. What is needed is a single resource that allows submission of JCAMP-DX files, export of the raw data in multiple formats, searching based on multiple chemical identifiers, and is open in terms of license and access. To address these issues a new online resource called the Open Spectral Database (OSDB) http://osdb.info/ has been developed and is now available. Built using open source tools, using open code (hosted on GitHub), providing open data, and open to community input about design and functionality, the OSDB is available for anyone to submit spectral data, making it searchable and available to the scientific community. This paper details the concept and coding, internal architecture, export formats, Representational State Transfer (REST) Application Programming Interface and options for submission of data. The OSDB website went live in November 2015. Concurrently, the GitHub repository was made available at https://github.com/stuchalk/OSDB/, and is open for collaborators to join the project, submit issues, and contribute code. The combination of a scripting environment (PHPStorm), a PHP Framework (CakePHP), a relational database (MySQL) and a code repository (GitHub) provides all the capabilities to easily develop REST based websites for ingestion, curation and exposure of open chemical data to the community at all levels. It is hoped this software stack (or equivalent ones in other scripting languages) will be leveraged to make more chemical data available for both humans and computers.

  9. Internet programs for drawing moth pheromone analogs and searching literature database.

    PubMed

    Byers, John A

    2002-04-01

    An Internet web page is described for organizing and analyzing information about lepidopteran sex pheromone components. Hypertext markup language (HTML) with JavaScript program code is used to draw moth pheromone analogs by combining GIF bitmap images for viewing by web browsers such as Netscape or Microsoft Intemet Explorer. Straight-chain hydrocarbons of 5-22 carbons with epoxides or unsaturated positions of E or Z geometrical configuration with several altemative functional groups can be drawn by simply checking menu bars or checkboxes representing chain length, E/Z unsaturation points, epoxide position and chirality, and optional functional groups. The functional group can be an aldehyde, alcohol, or ester of formate, acetate, propionate, or butyrate. The program is capable of drawing several million structures and naming them [e.g., (E,E)-8,10-dodecadien-1-ol and abbreviated as E8E10-12:OH]. A Java applet program run from the same page searches forthe presently drawn structure in an intemal database compiled from the Pherolist, and if the component is found, provides a textarea display of the families and species using the component. Links are automatically specified for drawn components if found in the Pherolist web site (maintained by H. Am). Windowed links can also be made to two other JavaScript programs that allow searches of a web site database with over 5900 research citations on lepidopteran semiochemicals and a calculator of vapor pressures of some moth sex pheromone analogs at a specified temperature. Various evolutionary and biosynthetic aspects are discussed in regard to the diversity of moth sex pheromone components.

  10. webPRC: the Profile Comparer for alignment-based searching of public domain databases.

    PubMed

    Brandt, Bernd W; Heringa, Jaap

    2009-07-01

    Profile-profile methods are well suited to detect remote evolutionary relationships between protein families. Profile Comparer (PRC) is an existing stand-alone program for scoring and aligning hidden Markov models (HMMs), which are based on multiple sequence alignments. Since PRC compares profile HMMs instead of sequences, it can be used to find distant homologues. For this purpose, PRC is used by, for example, the CATH and Pfam-domain databases. As PRC is a profile comparer, it only reports profile HMM alignments and does not produce multiple sequence alignments. We have developed webPRC server, which makes it straightforward to search for distant homologues or similar alignments in a number of domain databases. In addition, it provides the results both as multiple sequence alignments and aligned HMMs. Furthermore, the user can view the domain annotation, evaluate the PRC hits with the Jalview multiple alignment editor and generate logos from the aligned HMMs or the aligned multiple alignments. Thus, this server assists in detecting distant homologues with PRC as well as in evaluating and using the results. The webPRC interface is available at http://www.ibi.vu.nl/programs/prcwww/.

  11. Neuron-Miner: An Advanced Tool for Morphological Search and Retrieval in Neuroscientific Image Databases.

    PubMed

    Conjeti, Sailesh; Mesbah, Sepideh; Negahdar, Mohammadreza; Rautenberg, Philipp L; Zhang, Shaoting; Navab, Nassir; Katouzian, Amin

    2016-10-01

    The steadily growing amounts of digital neuroscientific data demands for a reliable, systematic, and computationally effective retrieval algorithm. In this paper, we present Neuron-Miner, which is a tool for fast and accurate reference-based retrieval within neuron image databases. The proposed algorithm is established upon hashing (search and retrieval) technique by employing multiple unsupervised random trees, collectively called as Hashing Forests (HF). The HF are trained to parse the neuromorphological space hierarchically and preserve the inherent neuron neighborhoods while encoding with compact binary codewords. We further introduce the inverse-coding formulation within HF to effectively mitigate pairwise neuron similarity comparisons, thus allowing scalability to massive databases with little additional time overhead. The proposed hashing tool has superior approximation of the true neuromorphological neighborhood with better retrieval and ranking performance in comparison to existing generalized hashing methods. This is exhaustively validated by quantifying the results over 31266 neuron reconstructions from Neuromorpho.org dataset curated from 147 different archives. We envisage that finding and ranking similar neurons through reference-based querying via Neuron Miner would assist neuroscientists in objectively understanding the relationship between neuronal structure and function for applications in comparative anatomy or diagnosis.

  12. A method for fast database search for all k-nucleotide repeats.

    PubMed Central

    Benson, G; Waterman, M S

    1994-01-01

    A significant portion of DNA consists of repeating patterns of various sizes, from very small (one, two and three nucleotides) to very large (over 300 nucleotides). Although the functions of these repeating regions are not well understood, they appear important for understanding the expression, regulation and evolution of DNA. For example, increases in the number of trinucleotide repeats have been associated with human genetic disease, including Fragile-X mental retardation and Huntington's disease. Repeats are also useful as a tool in mapping and identifying DNA; the number of copies of a particular pattern at a site is often variable among individuals (polymorphic) and is therefore helpful in locating genes via linkage studies and also in providing DNA fingerprints of individuals. The number of repeating regions is unknown as is the distribution of pattern sizes. It would be useful to search for such regions in the DNA database in order that they may be studied more fully. The DNA database currently consists of approximately 150 million basepairs and is growing exponentially. Therefore, any program to look for repeats must be efficient and fast. In this paper, we present some new techniques that are useful in recognizing repeating patterns and describe a new program for rapidly detecting repeat regions in the DNA database where the basic unit of the repeat has size up to 32 nucleotides. It is our hope that the examples in this paper will illustrate the unrealized diversity of repeats in DNA and that the program we have developed will be a useful tool for locating new and interesting repeats. PMID:7984436

  13. Matrix-product-state simulation of an extended Brueschweiler bulk-ensemble database search

    SciTech Connect

    SaiToh, Akira; Kitagawa, Masahiro

    2006-06-15

    Brueschweiler's database search in a spin Liouville space can be efficiently simulated on a conventional computer without error as long as the simulation cost of the internal circuit of an oracle function is polynomial, unlike the fact that in true NMR experiments, it suffers from an exponential decrease in the variation of a signal intensity. With the simulation method using the matrix-product-state proposed by Vidal [G. Vidal, Phys. Rev. Lett. 91, 147902 (2003)], we perform such a simulation. We also show the extensions of the algorithm without utilizing the J-coupling or DD-coupling splitting of frequency peaks in observation: searching can be completed with a single query in polynomial postoracle circuit complexities in an extension; multiple solutions of an oracle can be found in another extension whose query complexity is linear in the key length and in the number of solutions (this extension is to find all of marked keys). These extended algorithms are also simulated with the same simulation method.

  14. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

    PubMed Central

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905

  15. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.

  16. deconSTRUCT: general purpose protein database search on the substructure level.

    PubMed

    Zhang, Zong Hong; Bharatham, Kavitha; Sherman, Westley A; Mihalek, Ivana

    2010-07-01

    deconSTRUCT webserver offers an interface to a protein database search engine, usable for a general purpose detection of similar protein (sub)structures. Initially, it deconstructs the query structure into its secondary structure elements (SSEs) and reassembles the match to the target by requiring a (tunable) degree of similarity in the direction and sequential order of SSEs. Hierarchical organization and judicious use of the information about protein structure enables deconSTRUCT to achieve the sensitivity and specificity of the established search engines at orders of magnitude increased speed, without tying up irretrievably the substructure information in the form of a hash. In a post-processing step, a match on the level of the backbone atoms is constructed. The results presented to the user consist of the list of the matched SSEs, the transformation matrix for rigid superposition of the structures and several ways of visualization, both downloadable and implemented as a web-browser plug-in. The server is available at http://epsf.bmad.bii.a-star.edu.sg/struct_server.html.

  17. Searching for robust quantum memories in many coupled oscillators

    NASA Astrophysics Data System (ADS)

    Bosco de Magalhães, A. R.

    2011-11-01

    The relation between microscopic symmetries in the system-environment interaction and the emergence of robust states is studied for many linearly coupled harmonic oscillators. Different types of symmetry, which are introduced into the model as terms in the coupling constants between each system's oscillator and a common reservoir, lead to distinct robust modes. Since these modes are partially or completely immune to the symmetric part of the environmental noise, they are good candidates for building quantum memories. A comparison of the model investigated here, with bilinear system-reservoir coupling, and a model where such coupling presents an exponential dependence on the variables of interest is performed.

  18. Search for a border between classical and quantum worlds

    NASA Astrophysics Data System (ADS)

    Alicki, Robert

    2002-03-01

    The effects of environmental decoherence on a mass-center position of a body consisting of many atoms are studied using a kind of linear quantum Boltzmann equation. It is shown that under realistic laboratory conditions these effects can be essentially eliminated for dust particles containing 1015 atoms. However, the initial velocity distribution and certain geometrical conditions make standard interference-type measurements extremally difficult beyond the nanometer scale. The results are illustrated by the analysis of the recent experiments involving fullerenes. Applications of decoherence effects to precise monitoring of environment or to separation of molecules are suggested.

  19. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2015-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  20. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2014-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  1. Low-Mode Conformational Search Method with Semiempirical Quantum Mechanical Calculations: Application to Enantioselective Organocatalysis.

    PubMed

    Kamachi, Takashi; Yoshizawa, Kazunari

    2016-02-22

    A conformational search program for finding low-energy conformations of large noncovalent complexes has been developed. A quantitatively reliable semiempirical quantum mechanical PM6-DH+ method, which is able to accurately describe noncovalent interactions at a low computational cost, was employed in contrast to conventional conformational search programs in which molecular mechanical methods are usually adopted. Our approach is based on the low-mode method whereby an initial structure is perturbed along one of its low-mode eigenvectors to generate new conformations. This method was applied to determine the most stable conformation of transition state for enantioselective alkylation by the Maruoka and cinchona alkaloid catalysts and Hantzsch ester hydrogenation of imines by chiral phosphoric acid. Besides successfully reproducing the previously reported most stable DFT conformations, the conformational search with the semiempirical quantum mechanical calculations newly discovered a more stable conformation at a low computational cost.

  2. MSDsite: a database search and retrieval system for the analysis and viewing of bound ligands and active sites.

    PubMed

    Golovin, Adel; Dimitropoulos, Dimitris; Oldfield, Tom; Rachedi, Abdelkrim; Henrick, Kim

    2005-01-01

    The three-dimensional environments of ligand binding sites have been derived from the parsing and loading of the PDB entries into a relational database. For each bound molecule the biological assembly of the quaternary structure has been used to determine all contact residues and a fast interactive search and retrieval system has been developed. Prosite pattern and short sequence search options are available together with a novel graphical query generator for inter-residue contacts. The database and its query interface are accessible from the Internet through a web server located at: http://www.ebi.ac.uk/msd-srv/msdsite.

  3. The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies

    NASA Astrophysics Data System (ADS)

    Kirklin, Scott; Saal, James E.; Meredig, Bryce; Thompson, Alex; Doak, Jeff W.; Aykol, Muratahan; Rühl, Stephan; Wolverton, Chris

    2015-12-01

    The Open Quantum Materials Database (OQMD) is a high-throughput database currently consisting of nearly 300,000 density functional theory (DFT) total energy calculations of compounds from the Inorganic Crystal Structure Database (ICSD) and decorations of commonly occurring crystal structures. To maximise the impact of these data, the entire database is being made available, without restrictions, at www.oqmd.org/download. In this paper, we outline the structure and contents of the database, and then use it to evaluate the accuracy of the calculations therein by comparing DFT predictions with experimental measurements for the stability of all elemental ground-state structures and 1,670 experimental formation energies of compounds. This represents the largest comparison between DFT and experimental formation energies to date. The apparent mean absolute error between experimental measurements and our calculations is 0.096 eV/atom. In order to estimate how much error to attribute to the DFT calculations, we also examine deviation between different experimental measurements themselves where multiple sources are available, and find a surprisingly large mean absolute error of 0.082 eV/atom. Hence, we suggest that a significant fraction of the error between DFT and experimental formation energies may be attributed to experimental uncertainties. Finally, we evaluate the stability of compounds in the OQMD (including compounds obtained from the ICSD as well as hypothetical structures), which allows us to predict the existence of ~3,200 new compounds that have not been experimentally characterised and uncover trends in material discovery, based on historical data available within the ICSD.

  4. Molecular Quantum Similarity, Chemical Reactivity and Database Screening of 3D Pharmacophores of the Protein Kinases A, B and G from Mycobacterium tuberculosis.

    PubMed

    Morales-Bayuelo, Alejandro

    2017-06-21

    Mycobacterium tuberculosis remains one of the world's most devastating pathogens. For this reason, we developed a study involving 3D pharmacophore searching, selectivity analysis and database screening for a series of anti-tuberculosis compounds, associated with the protein kinases A, B, and G. This theoretical study is expected to shed some light onto some molecular aspects that could contribute to the knowledge of the molecular mechanics behind interactions of these compounds, with anti-tuberculosis activity. Using the Molecular Quantum Similarity field and reactivity descriptors supported in the Density Functional Theory, it was possible to measure the quantification of the steric and electrostatic effects through the Overlap and Coulomb quantitative convergence (alpha and beta) scales. In addition, an analysis of reactivity indices using global and local descriptors was developed, identifying the binding sites and selectivity on these anti-tuberculosis compounds in the active sites. Finally, the reported pharmacophores to PKn A, B and G, were used to carry out database screening, using a database with anti-tuberculosis drugs from the Kelly Chibale research group (http://www.kellychibaleresearch.uct.ac.za/), to find the compounds with affinity for the specific protein targets associated with PKn A, B and G. In this regard, this hybrid methodology (Molecular Mechanic/Quantum Chemistry) shows new insights into drug design that may be useful in the tuberculosis treatment today.

  5. A Nonhomogeneous Cuckoo Search Algorithm Based on Quantum Mechanism for Real Parameter Optimization.

    PubMed

    Cheung, Ngaam J; Ding, Xue-Ming; Shen, Hong-Bin

    2017-02-01

    Cuckoo search (CS) algorithm is a nature-inspired search algorithm, in which all the individuals have identical search behaviors. However, this simple homogeneous search behavior is not always optimal to find the potential solution to a special problem, and it may trap the individuals into local regions leading to premature convergence. To overcome the drawback, this paper presents a new variant of CS algorithm with nonhomogeneous search strategies based on quantum mechanism to enhance search ability of the classical CS algorithm. Featured contributions in this paper include: 1) quantum-based strategy is developed for nonhomogeneous update laws and 2) we, for the first time, present a set of theoretical analyses on CS algorithm as well as the proposed algorithm, respectively, and conclude a set of parameter boundaries guaranteeing the convergence of the CS algorithm and the proposed algorithm. On 24 benchmark functions, we compare our method with five existing CS-based methods and other ten state-of-the-art algorithms. The numerical results demonstrate that the proposed algorithm is significantly better than the original CS algorithm and the rest of compared methods according to two nonparametric tests.

  6. Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database

    PubMed Central

    2007-01-01

    We present a novel protein structure database search tool, 3D-BLAST, that is useful for analyzing novel structures and can return a ranked list of alignments. This tool has the features of BLAST (for example, robust statistical basis, and effective and reliable search capabilities) and employs a kappa-alpha (κ, α) plot derived structural alphabet and a new substitution matrix. 3D-BLAST searches more than 12,000 protein structures in 1.2 s and yields good results in zones with low sequence similarity. PMID:17335583

  7. Searching for NEO precoveries in the PS1 and MPC databases

    NASA Astrophysics Data System (ADS)

    Weryk, Robert J.; Wainscoat, Richard J.

    2016-10-01

    The Pan-STARRS (PS1) survey telescope, operated by the University of Hawai`i, covers the sky north of -49 degrees declination with its seven square degree field-of-view. Described in detail by Wainscoat et al. (2015), it has become the leading telescope for new Near Earth Object (NEO) discoveries. In 2015, it found almost half of the new Near Earth Asteroids, as well as half of the new comets.Observations of potential NEOs must be followed up before they can be confirmed and announced as new discoveries, and we are dependent on the follow-up capabilities of other telescopes for this. However, not every NEO candidate is immediately followed up and linked into a well established orbit, possibly due to the fact that smaller bodies may not be visible at current instrument sensitivity limits for very long, or that their predicted orbits are too uncertain so follow-up telescopes look in the wrong location. But in certain cases, these objects may have been observed during previous lunations.We present a method to search for precovery detections in both the PS1 database, and the Isolated Tracklet File (ITF) provided by the Minor Planet Center (MPC). This file contains over 12 million detections mostly from the large surveys, which are not associated with any known objects. We demonstrate that multi-tracklet linkages for both known and unknown objects may be found in these databases, including detections for both NEOs and non-NEOs which often appear on the MPC's NEO Confirmation Page.[1] Wainscoat, R. et al., IAU Symposium 318, editors S. Chesley and R. Jedicke (2015)

  8. Hadamard NMR spectroscopy for two-dimensional quantum information processing and parallel search algorithms.

    PubMed

    Gopinath, T; Kumar, Anil

    2006-12-01

    Hadamard spectroscopy has earlier been used to speed-up multi-dimensional NMR experiments. In this work, we speed-up the two-dimensional quantum computing scheme, by using Hadamard spectroscopy in the indirect dimension, resulting in a scheme which is faster and requires the Fourier transformation only in the direct dimension. Two and three qubit quantum gates are implemented with an extra observer qubit. We also use one-dimensional Hadamard spectroscopy for binary information storage by spatial encoding and implementation of a parallel search algorithm.

  9. GLYCEMIC CONTROL AND PROSTATE CANCER PROGRESSION: RESULTS FROM THE SEARCH DATABASE

    PubMed Central

    Kim, Howard; Presti, Joseph C.; Aronson, William J.; Terris, Martha K.; Kane, Christopher J.; Amling, Christopher L.; Freedland, Stephen J.

    2010-01-01

    Purpose Several studies have examined the association between diabetes mellitus (DM) and prostate cancer (PCa) risk and progression, however nearly all of these studies have compared diabetic vs. non-diabetic men. We sought to investigate the role of glycemic control, as measured by HbA1c, on PCa aggressiveness and prognosis in men with DM and PCa from the Shared Equal Access Regional Cancer Hospital (SEARCH) database. Methods We identified 247 men in SEARCH with DM and a recorded HbA1c value twelve months prior to radical prostatectomy between 1988 and 2009. We divided men into tertiles by HbA1c level. The association between HbA1c tertiles and risk of adverse pathology and biochemical recurrence were tested using multivariate logistic regression and Cox proportional hazards models, respectively. Results Median HbA1c level was 6.9. On multivariate analysis, HbA1c tertiles were predictive of pathological Gleason score (p-trend=0.001). Relative to the first tertile, men in the second (OR 5.90, p=0.002) and third tertile (OR 7.15, p=0.001) were more likely to have Gleason score ≥ 4+3. HbA1c tertiles were not associated with margin status, node status, extracapsular extension or seminal vesicle invasion (all p-trend>0.2). In the multivariate Cox proportional hazards model, increasing HbA1c tertiles were not significantly related to risk of biochemical recurrence (p-trend=0.56). Conclusion Men with higher HbA1c levels presented with more biologically aggressive prostate tumors at radical prostatectomy. Although risk of recurrence was unrelated to HbA1c levels, further studies are needed to better explore the importance of glycemic control on long-term outcomes in diabetic men with PCa. PMID:20687228

  10. Colloquium: Herbertsmithite and the search for the quantum spin liquid

    DOE PAGES

    Norman, M. R.

    2016-12-02

    Quantum spin liquids form a novel class of matter where, despite the existence of strong exchange interactions, spins do not order down to the lowest measured temperature. Typically, these occur in lattices that act to frustrate the appearance of magnetism. In two dimensions, the classic example is the kagome lattice composed of corner sharing triangles. There are a variety of minerals whose transition metal ions form such a lattice. Hence, a number of them have been studied and were then subsequently synthesized in order to obtain more pristine samples. Of particular note was the report in 2005 by Dan Nocera'smore » group of the synthesis of herbertsmithite, composed of a lattice of copper ions sitting on a kagome lattice, which indeed does not order down to the lowest measured temperature despite the existence of a large exchange interaction of 17 meV. Over the past decade, this material has been extensively studied, yielding a number of intriguing surprises that have in turn motivated a resurgence of interest in the theoretical study of the spin 1/2 Heisenberg model on a kagome lattice. In this paper, this Colloquium reviews these developments and then discusses potential future directions, both experimental and theoretical, as well as the challenge of doping these materials with the hope that this could lead to the discovery of novel topological and superconducting phases.« less

  11. Colloquium: Herbertsmithite and the search for the quantum spin liquid

    SciTech Connect

    Norman, M. R.

    2016-12-02

    Quantum spin liquids form a novel class of matter where, despite the existence of strong exchange interactions, spins do not order down to the lowest measured temperature. Typically, these occur in lattices that act to frustrate the appearance of magnetism. In two dimensions, the classic example is the kagome lattice composed of corner sharing triangles. There are a variety of minerals whose transition metal ions form such a lattice. Hence, a number of them have been studied and were then subsequently synthesized in order to obtain more pristine samples. Of particular note was the report in 2005 by Dan Nocera's group of the synthesis of herbertsmithite, composed of a lattice of copper ions sitting on a kagome lattice, which indeed does not order down to the lowest measured temperature despite the existence of a large exchange interaction of 17 meV. Over the past decade, this material has been extensively studied, yielding a number of intriguing surprises that have in turn motivated a resurgence of interest in the theoretical study of the spin 1/2 Heisenberg model on a kagome lattice. In this paper, this Colloquium reviews these developments and then discusses potential future directions, both experimental and theoretical, as well as the challenge of doping these materials with the hope that this could lead to the discovery of novel topological and superconducting phases.

  12. Colloquium: Herbertsmithite and the search for the quantum spin liquid

    NASA Astrophysics Data System (ADS)

    Norman, M. R.

    2016-10-01

    Quantum spin liquids form a novel class of matter where, despite the existence of strong exchange interactions, spins do not order down to the lowest measured temperature. Typically, these occur in lattices that act to frustrate the appearance of magnetism. In two dimensions, the classic example is the kagome lattice composed of corner sharing triangles. There are a variety of minerals whose transition metal ions form such a lattice. Hence, a number of them have been studied and were then subsequently synthesized in order to obtain more pristine samples. Of particular note was the report in 2005 by Dan Nocera's group of the synthesis of herbertsmithite, composed of a lattice of copper ions sitting on a kagome lattice, which indeed does not order down to the lowest measured temperature despite the existence of a large exchange interaction of 17 meV. Over the past decade, this material has been extensively studied, yielding a number of intriguing surprises that have in turn motivated a resurgence of interest in the theoretical study of the spin 1 /2 Heisenberg model on a kagome lattice. This Colloquium reviews these developments and then discusses potential future directions, both experimental and theoretical, as well as the challenge of doping these materials with the hope that this could lead to the discovery of novel topological and superconducting phases.

  13. Complementary Value of Databases for Discovery of Scholarly Literature: A User Survey of Online Searching for Publications in Art History

    ERIC Educational Resources Information Center

    Nemeth, Erik

    2010-01-01

    Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…

  14. Complementary Value of Databases for Discovery of Scholarly Literature: A User Survey of Online Searching for Publications in Art History

    ERIC Educational Resources Information Center

    Nemeth, Erik

    2010-01-01

    Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…

  15. Microwave experiments simulating quantum search and directed transport in artificial graphene.

    PubMed

    Böhm, Julian; Bellec, Matthieu; Mortessagne, Fabrice; Kuhl, Ulrich; Barkhofen, Sonja; Gehler, Stefan; Stöckmann, Hans-Jürgen; Foulger, Iain; Gnutzmann, Sven; Tanner, Gregor

    2015-03-20

    A series of quantum search algorithms have been proposed recently providing an algebraic speedup compared to classical search algorithms from N to √N, where N is the number of items in the search space. In particular, devising searches on regular lattices has become popular in extending Grover's original algorithm to spatial searching. Working in a tight-binding setup, it could be demonstrated, theoretically, that a search is possible in the physically relevant dimensions 2 and 3 if the lattice spectrum possesses Dirac points. We present here a proof of principle experiment implementing wave search algorithms and directed wave transport in a graphene lattice arrangement. The idea is based on bringing localized search states into resonance with an extended lattice state in an energy region of low spectral density-namely, at or near the Dirac point. The experiment is implemented using classical waves in a microwave setup containing weakly coupled dielectric resonators placed in a honeycomb arrangement, i.e., artificial graphene. Furthermore, we investigate the scaling behavior experimentally using linear chains.

  16. Introducing a New Interface for the Online MagIC Database by Integrating Data Uploading, Searching, and Visualization

    NASA Astrophysics Data System (ADS)

    Jarboe, N.; Minnett, R.; Constable, C.; Koppers, A. A.; Tauxe, L.

    2013-12-01

    The Magnetics Information Consortium (MagIC) is dedicated to supporting the paleomagnetic, geomagnetic, and rock magnetic communities through the development and maintenance of an online database (http://earthref.org/MAGIC/), data upload and quality control, searches, data downloads, and visualization tools. While MagIC has completed importing some of the IAGA paleomagnetic databases (TRANS, PINT, PSVRL, GPMDB) and continues to import others (ARCHEO, MAGST and SECVR), further individual data uploading from the community contributes a wealth of easily-accessible rich datasets. Previously uploading of data to the MagIC database required the use of an Excel spreadsheet using either a Mac or PC. The new method of uploading data utilizes an HTML 5 web interface where the only computer requirement is a modern browser. This web interface will highlight all errors discovered in the dataset at once instead of the iterative error checking process found in the previous Excel spreadsheet data checker. As a web service, the community will always have easy access to the most up-to-date and bug free version of the data upload software. The filtering search mechanism of the MagIC database has been changed to a more intuitive system where the data from each contribution is displayed in tables similar to how the data is uploaded (http://earthref.org/MAGIC/search/). Searches themselves can be saved as a permanent URL, if desired. The saved search URL could then be used as a citation in a publication. When appropriate, plots (equal area, Zijderveld, ARAI, demagnetization, etc.) are associated with the data to give the user a quicker understanding of the underlying dataset. The MagIC database will continue to evolve to meet the needs of the paleomagnetic, geomagnetic, and rock magnetic communities.

  17. In search of yoga: Research trends in a western medical database.

    PubMed

    McCall, Marcy C

    2014-01-01

    The promotion of yoga practice as a preventative and treatment therapy for health outcomes in the western hemisphere is increasing rapidly. As the commercial success of yoga burgeons in popular culture, it is important to investigate the trends of yoga as a therapeutic intervention in academic literature. The free-access search engine, PubMed is a preeminent resource to identify health-related research articles published for academics, health practitioners and others. To report the recent yoga-related publications in the western healthcare context with particular interest in the subject and type of yoga titles. A bibliometric analysis to describe the annual trends in publication on PubMed from January 1950 to December 2012. The number of yoga-related titles included in the PubMed database is limited until a marked increase 2000 and steady surge since 2007. Bibliometric analysis indicates that more than 200 new titles are added per annum since 2011. Systematic reviews and yoga trials are increasing exponentially, indicating a potential increase in the quality of evidence. Titles including pain management, stress or anxiety, depression and cancer conditions are highly correlated with yoga and healthcare research. The prevalence of yoga research in western healthcare is increasing. The marked increase in volume indicates the need for more systematic analysis of the literature in terms of quality and results.

  18. Subspace projection method for unstructured searches with noisy quantum oracles using a signal-based quantum emulation device

    NASA Astrophysics Data System (ADS)

    La Cour, Brian R.; Ostrove, Corey I.

    2017-01-01

    This paper describes a novel approach to solving unstructured search problems using a classical, signal-based emulation of a quantum computer. The classical nature of the representation allows one to perform subspace projections in addition to the usual unitary gate operations. Although bandwidth requirements will limit the scale of problems that can be solved by this method, it can nevertheless provide a significant computational advantage for problems of limited size. In particular, we find that, for the same number of noisy oracle calls, the proposed subspace projection method provides a higher probability of success for finding a solution than does an single application of Grover's algorithm on the same device.

  19. Quantum spin ice: a search for gapless quantum spin liquids in pyrochlore magnets.

    PubMed

    Gingras, M J P; McClarty, P A

    2014-05-01

    The spin ice materials, including Ho2Ti2O7 and Dy2Ti2O7, are rare-earth pyrochlore magnets which, at low temperatures, enter a constrained paramagnetic state with an emergent gauge freedom. Spin ices provide one of very few experimentally realized examples of fractionalization because their elementary excitations can be regarded as magnetic monopoles and, over some temperature range, spin ice materials are best described as liquids of these emergent charges. In the presence of quantum fluctuations, one can obtain, in principle, a quantum spin liquid descended from the classical spin ice state characterized by emergent photon-like excitations. Whereas in classical spin ices the excitations are akin to electrostatic charges with a mutual Coulomb interaction, in the quantum spin liquid these charges interact through a dynamic and emergent electromagnetic field. In this review, we describe the latest developments in the study of such a quantum spin ice, focusing on the spin liquid phenomenology and the kinds of materials where such a phase might be found.

  20. Deterministic quantum-public-key encryption: Forward search attack and randomization

    NASA Astrophysics Data System (ADS)

    Nikolopoulos, Georgios M.; Ioannou, Lawrence M.

    2009-04-01

    In the classical setting, public-key encryption requires randomness in order to be secure against a forward search attack, whereby an adversary compares the encryption of a guess of the secret message with the encryption of the actual secret message. We show that this is also true in the information-theoretic setting—where the public keys are quantum systems—by defining and giving an example of a forward search attack for any deterministic quantum-public-key bit-encryption scheme. However, unlike in the classical setting, we show that any such deterministic scheme can be used as a black box to build a randomized bit-encryption scheme that is no longer susceptible to this attack.

  1. A quantum-behaved evolutionary algorithm based on the Bloch spherical search

    NASA Astrophysics Data System (ADS)

    Li, Panchi

    2014-04-01

    In order to enhance the optimization ability of the quantum evolutionary algorithms, a new quantum-behaved evolutionary algorithm is proposed. In this algorithm, the search mechanism is established based on the Bloch sphere. First, the individuals are expressed by qubits described on the Bloch sphere, then the rotation axis is established by Pauli matrixes, and the evolution search is realized by rotating qubits on the Bloch sphere about the rotating axis. In order to avoid premature convergence, the mutation of individuals is achieved by the Hadamard gates. Such rotation can make the current qubit approximate the target qubit along with the great circle on the Bloch sphere, which can accelerate optimization process. Taking the function extreme value optimization as an example, the experimental results show that the proposed algorithm is obviously superior to other similar algorithms.

  2. Search for quantum size effects in ultrathin epitaxial metallic films

    NASA Astrophysics Data System (ADS)

    Badoz, P. A.; D'Avitaya, F. Arnaud; Rosencher, E.

    In order to investigate quantum size effects in ultrathin metal films, tunneling spectroscopy measurements have been performed in the epitaxial CoSi2/Si system, with a metal thickness ranging from 1000 Å down to 35 A, i.e. a few de Broglie wavelengths of electrons in CoSi 2. The resulting spectra show extremely rich sets of features, the origin of which are investigated. The peaks observed at low energy (-100 meV, +100 meV) are thickness independent and attributed to phonon emission by hot electrons. The peaks observed at higher energy (up to 600 meV) are thickness dependent but their physical origin is not yet fully ascertained. The absence of unambiguous electron quantization effects in these epitaxial films is discussed and tentatively attributed to small thickness fluctuations (of the order of a few monolayers), which tend to blur the quantization of the electronic energies. Nous avons étudié les effets de quantification électronique dans les films métalliques minces par spectroscopie tunnel des jonctions CoSi2/Si épitaxiées, pour des épaisseurs de film métallique variant de 1000 ? 35 A, i.e. quelques longueurs d'ondes de de Broglie des électrons dans le CoSi 2. Les spectres obtenus montrent un ensemble de structures extrêmement riche dont nous discutons l'origine physique. Les pics observés à faible énergie (-100 meV, + 100 meV) sont indépendants de I'épaisseur du film de CoSi2, et attribués à l'émission de phonons par les électrons chauds dans le silicium. L'origine des pics observés à plus forte énergie (jusqu'à 600 meV) et dont la position dépend de l'épaisseur du film, est encore incomplètement comprise. Nous discutons l'absence d'effets clairement reliés à la quantification du gaz d'électrons bidimensionnel dans ces films épitaxiés: celle-ci pourrait provenir de faibles fluctuations d'épaisseur (quelques monocouches) qui tendent à brouiller la quantification des énergies électroniques.

  3. MassMatrix: A Database Search Program for Rapid Characterization of Proteins and Peptides from Tandem Mass Spectrometry Data

    PubMed Central

    Xu, Hua; Freitas, Michael A.

    2009-01-01

    MassMatrix is a program that matches tandem mass spectra with theoretical peptide sequences derived from a protein database. The program uses a mass accuracy sensitive probabilistic score model to rank peptide matches. The tandem mass spectrometry search software was evaluated by use of a high mass accuracy data set and its results compared with those from Mascot, SEQUEST, X!Tandem, and OMSSA. For the high mass accuracy data, MassMatrix provided better sensitivity than Mascot, SEQUEST, X!Tandem, and OMSSA for a given specificity and the percentage of false positives was 2%. More importantly all manually validated true positives corresponded to a unique peptide/spectrum match. The presence of decoy sequence and additional variable post-translational modifications did not significantly affect the results from the high mass accuracy search. MassMatrix performs well when compared with Mascot, SEQUEST, X!Tandem, and OMSSA with regard to search time. MassMatrix was also run on a distributed memory clusters and achieved search speeds of ~100,000 spectra per hour when searching against a complete human database with 8 variable modifications. The algorithm is available for public searches at http://www.massmatrix.net. PMID:19235167

  4. Highly charged ions for atomic clocks, quantum information, and search for α variation.

    PubMed

    Safronova, M S; Dzuba, V A; Flambaum, V V; Safronova, U I; Porsev, S G; Kozlov, M G

    2014-07-18

    We propose 10 highly charged ions as candidates for the development of next generation atomic clocks, quantum information, and search for α variation. They have long-lived metastable states with transition wavelengths to the ground state between 170-3000 nm, relatively simple electronic structure, stable isotopes, and high sensitivity to α variation (e.g., Sm(14+), Pr(10+), Sm(13+), Nd(10+)). We predict their properties crucial for the experimental exploration and highlight particularly attractive systems for these applications.

  5. Systematic Dimensionality Reduction for Quantum Walks: Optimal Spatial Search and Transport on Non-Regular Graphs

    PubMed Central

    Novo, Leonardo; Chakraborty, Shantanav; Mohseni, Masoud; Neven, Hartmut; Omar, Yasser

    2015-01-01

    Continuous time quantum walks provide an important framework for designing new algorithms and modelling quantum transport and state transfer problems. Often, the graph representing the structure of a problem contains certain symmetries that confine the dynamics to a smaller subspace of the full Hilbert space. In this work, we use invariant subspace methods, that can be computed systematically using the Lanczos algorithm, to obtain the reduced set of states that encompass the dynamics of the problem at hand without the specific knowledge of underlying symmetries. First, we apply this method to obtain new instances of graphs where the spatial quantum search algorithm is optimal: complete graphs with broken links and complete bipartite graphs, in particular, the star graph. These examples show that regularity and high-connectivity are not needed to achieve optimal spatial search. We also show that this method considerably simplifies the calculation of quantum transport efficiencies. Furthermore, we observe improved efficiencies by removing a few links from highly symmetric graphs. Finally, we show that this reduction method also allows us to obtain an upper bound for the fidelity of a single qubit transfer on an XY spin network. PMID:26330082

  6. Systematic Dimensionality Reduction for Quantum Walks: Optimal Spatial Search and Transport on Non-Regular Graphs

    NASA Astrophysics Data System (ADS)

    Novo, Leonardo; Chakraborty, Shantanav; Mohseni, Masoud; Neven, Hartmut; Omar, Yasser

    2015-09-01

    Continuous time quantum walks provide an important framework for designing new algorithms and modelling quantum transport and state transfer problems. Often, the graph representing the structure of a problem contains certain symmetries that confine the dynamics to a smaller subspace of the full Hilbert space. In this work, we use invariant subspace methods, that can be computed systematically using the Lanczos algorithm, to obtain the reduced set of states that encompass the dynamics of the problem at hand without the specific knowledge of underlying symmetries. First, we apply this method to obtain new instances of graphs where the spatial quantum search algorithm is optimal: complete graphs with broken links and complete bipartite graphs, in particular, the star graph. These examples show that regularity and high-connectivity are not needed to achieve optimal spatial search. We also show that this method considerably simplifies the calculation of quantum transport efficiencies. Furthermore, we observe improved efficiencies by removing a few links from highly symmetric graphs. Finally, we show that this reduction method also allows us to obtain an upper bound for the fidelity of a single qubit transfer on an XY spin network.

  7. Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences

    PubMed Central

    Stephens, Susie M.; Chen, Jake Y.; Davidson, Marcel G.; Thomas, Shiby; Trute, Barry M.

    2005-01-01

    As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html PMID:15608287

  8. Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences.

    PubMed

    Stephens, Susie M; Chen, Jake Y; Davidson, Marcel G; Thomas, Shiby; Trute, Barry M

    2005-01-01

    As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html.

  9. SPLICE: A program to assemble partial query solutions from three-dimensional database searches into novel ligands

    NASA Astrophysics Data System (ADS)

    Ho, Chris M. W.; Marshall, Garland R.

    1993-12-01

    SPLICE is a program that processes partial query solutions retrieved from 3D, structural databases to generate novel, aggregate ligands. It is designed to interface with the database searching program FOUNDATION, which retrieves fragments containing any combination of a user-specified minimum number of matching query elements. SPLICE eliminates aspects of structures that are physically incapable of binding within the active site. Then, a systematic rule-based procedure is performed upon the remaining fragments to ensure receptor complementarity. All modifications are automated and remain transparent to the user. Ligands are then assembled by linking components into composite structures through overlapping bonds. As a control experiment, FOUNDATION and SPLICE were used to reconstruct a know HIV-1 protease inhibitor after it had been fragmented, reoriented, and added to a sham database of fifty different small molecules. To illustrate the capabilities of this program, a 3D search query containing the pharmacophoric elements of an aspartic proteinase-inhibitor crystal complex was searched using FOUNDATION against a subset of the Cambridge Structural Database. One hundred thirty-one compounds were retrieved, each containing any combination of at least four query elements. Compounds were automatically screened and edited for receptor complementarity. Numerous combinations of fragments were discovered that could be linked to form novel structures, containing a greater number of pharmacophoric elements than any single retrieved fragment.

  10. Search and Study of UV-Excess Objects in the DFBS Database

    NASA Astrophysics Data System (ADS)

    Sinamyan, Parandzem K.; Sargsyan, Lusine A.; Mickaelian, Areg M.; Massaro, Enrico; Nesci, Roberto; Rossi, Corinne; Gaudenzi, Silvia; Cirimele, Giuseppe

    2007-08-01

    DFBS is a digitized version of the famous Markarian survey (or the First Byurakan Survey, FBS). The project has been carried out by teams from Byurakan, Rome and Cornell, using an EPSON Expression 1680 Pro scanner. The DFBS will serve as a unique spectroscopic database for studies in large areas (total surface is 17,000 sq. degrees) at high galactic latitudes, approximate classification for objects (20,000,000 objects are present), selection of samples of objects for definite studies (UV-excess objects, extremely red objects, variables, etc.). A joint usage of the direct images and spectra give larger possibilities for various studies and more efficient use of the survey. Using the dedicated BSpec software written by one of the authors (GC), we have obtained a list of DFBS stars, their positions, B and R magnitudes, and preliminarily classification for DFBS zones with central DEC=+39° and DEC=+43°. The spectral length l>90pix (compared to the total length 10^7pix) was used as a criterion to search for UV-excess objects, as this corresponds to the criteria used during the 2nd part of the FBS. However, the spectra of objects with B<13 always occupy the full length, and they were excluded from the lists. On the other hand, for the fainter objects (near the plate limit), we weaken the criteria of selection (l>80pix), as their spectra are shorter. An additional point for the UV excess object classification is the following: the spectra of the UV-excess objects are divided into two parts by a sensitivity gap in green; the red-yellow part of the spectra must be weaker and the blue-ultraviolet part must be brighter and more extended. We started the project in the DFBS zone +39° and +43° to compare the results with those obtained before during the 2nd part of the FBS. Later on, cross-correlations with available catalogs and a multi-wavelength analysis was made for the found objects. The preliminary results of the search and studies will be reported.

  11. Validation of SmartRank: A likelihood ratio software for searching national DNA databases with complex DNA profiles.

    PubMed

    Benschop, Corina C G; van de Merwe, Linda; de Jong, Jeroen; Vanvooren, Vanessa; Kempenaers, Morgane; Kees van der Beek, C P; Barni, Filippo; Reyes, Eusebio López; Moulin, Léa; Pene, Laurent; Haned, Hinda; Sijen, Titia

    2017-07-01

    Searching a national DNA database with complex and incomplete profiles usually yields very large numbers of possible matches that can present many candidate suspects to be further investigated by the forensic scientist and/or police. Current practice in most forensic laboratories consists of ordering these 'hits' based on the number of matching alleles with the searched profile. Thus, candidate profiles that share the same number of matching alleles are not differentiated and due to the lack of other ranking criteria for the candidate list it may be difficult to discern a true match from the false positives or notice that all candidates are in fact false positives. SmartRank was developed to put forward only relevant candidates and rank them accordingly. The SmartRank software computes a likelihood ratio (LR) for the searched profile and each profile in the DNA database and ranks database entries above a defined LR threshold according to the calculated LR. In this study, we examined for mixed DNA profiles of variable complexity whether the true donors are retrieved, what the number of false positives above an LR threshold is and the ranking position of the true donors. Using 343 mixed DNA profiles over 750 SmartRank searches were performed. In addition, the performance of SmartRank and CODIS were compared regarding DNA database searches and SmartRank was found complementary to CODIS. We also describe the applicable domain of SmartRank and provide guidelines. The SmartRank software is open-source and freely available. Using the best practice guidelines, SmartRank enables obtaining investigative leads in criminal cases lacking a suspect. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Preparing College Students To Search Full-Text Databases: Is Instruction Necessary?

    ERIC Educational Resources Information Center

    Riley, Cheryl; Wales, Barbara

    Full-text databases allow Central Missouri State University's clients to access some of the serials that libraries have had to cancel due to escalating subscription costs; EbscoHost, the subject of this study, is one such database. The database is available free to all Missouri residents. A survey was designed consisting of 21 questions intended…

  13. Novel DOCK clique driven 3D similarity database search tools for molecule shape matching and beyond: adding flexibility to the search for ligand kin.

    PubMed

    Good, Andrew C

    2007-10-01

    With readily available CPU power and copious disk storage, it is now possible to undertake rapid comparison of 3D properties derived from explicit ligand overlay experiments. With this in mind, shape software tools originally devised in the 1990s are revisited, modified and applied to the problem of ligand database shape comparison. The utility of Connolly surface data is highlighted using the program MAKESITE, which leverages surface normal data to a create ligand shape cast. This cast is applied directly within DOCK, allowing the program to be used unmodified as a shape searching tool. In addition, DOCK has undergone multiple modifications to create a dedicated ligand shape comparison tool KIN. Scoring has been altered to incorporate the original incarnation of Gaussian function derived shape description based on STO-3G atomic electron density. In addition, a tabu-like search refinement has been added to increase search speed by removing redundant starting orientations produced during clique matching. The ability to use exclusion regions, again based on Gaussian shape overlap, has also been integrated into the scoring function. The use of both DOCK with MAKESITE and KIN in database screening mode is illustrated using a published ligand shape virtual screening template. The advantages of using a clique-driven search paradigm are highlighted, including shape optimization within a pharmacophore constrained framework, and easy incorporation of additional scoring function modifications. The potential for further development of such methods is also discussed.

  14. Testing search strategies for systematic reviews in the Medline literature database through PubMed.

    PubMed

    Volpato, Enilze S N; Betini, Marluci; El Dib, Regina

    2014-04-01

    A high-quality electronic search is essential in ensuring accuracy and completeness in retrieved records for the conducting of a systematic review. We analysed the available sample of search strategies to identify the best method for searching in Medline through PubMed, considering the use or not of parenthesis, double quotation marks, truncation and use of a simple search or search history. In our cross-sectional study of search strategies, we selected and analysed the available searches performed during evidence-based medicine classes and in systematic reviews conducted in the Botucatu Medical School, UNESP, Brazil. We analysed 120 search strategies. With regard to the use of phrase searches with parenthesis, there was no difference between the results with and without parenthesis and simple searches or search history tools in 100% of the sample analysed (P = 1.0). The number of results retrieved by the searches analysed was smaller using double quotations marks and using truncation compared with the standard strategy (P = 0.04 and P = 0.08, respectively). There is no need to use phrase-searching parenthesis to retrieve studies; however, we recommend the use of double quotation marks when an investigator attempts to retrieve articles in which a term appears to be exactly the same as what was proposed in the search form. Furthermore, we do not recommend the use of truncation in search strategies in the Medline via PubMed. Although the results of simple searches or search history tools were the same, we recommend using the latter.

  15. Serum lipid profile and risk of prostate cancer recurrence: results from the SEARCH database

    PubMed Central

    Allott, Emma H.; Howard, Lauren E.; Cooperberg, Matthew R.; Kane, Christopher J.; Aronson, William J.; Terris, Martha K.; Amling, Christopher L.; Freedland, Stephen J.

    2014-01-01

    Background Evidence for an association between total cholesterol, low and high density lipoproteins (LDL and HDL, respectively), triglycerides and prostate cancer (PC) is conflicting. Given that PC and dyslipidemia affect large proportions of Western society, understanding these associations has public health importance. Methods We conducted a retrospective cohort analysis of 843 radical prostatectomy (RP) patients who never used statins before surgery within the Shared Equal Access Regional Cancer Hospital (SEARCH) database. Multivariable Cox proportional hazards analysis was used to investigate the association between cholesterol, LDL, HDL and triglycerides and biochemical recurrence risk. In secondary analysis, we explored these associations in patients with dyslipidemia, defined using National Cholesterol Education Program guidelines. Results Elevated serum triglycerides were associated with increased risk of PC recurrence (HRper 10 mg/dl 1.03; 95%CI 1.01–1.05) but associations between total cholesterol, LDL and HDL and recurrence risk were null. However, among men with dyslipidemia, each 10 mg/dl increase in cholesterol and HDL was associated with 9% increased recurrence risk (HR 1.09; 95%CI 1.01–1.17) and 39% reduced recurrence risk (HR 0.61; 95%CI 0.41–0.91), respectively. Conclusions Elevated serum triglycerides were associated with increased risk of PC recurrence. Cholesterol, LDL or HDL were not associated with recurrence risk among all men. However, among men with dyslipidemia, elevated cholesterol and HDL levels were associated with increased and decreased risk of recurrence, respectively. Impact These findings, coupled with evidence that statin use is associated with reduced recurrence risk, suggest that lipid levels should be explored as a modifiable risk factor for PC recurrence. PMID:25304929

  16. Postoperative statin use and risk of biochemical recurrence following radical prostatectomy: Results from the SEARCH database

    PubMed Central

    Allott, Emma H.; Howard, Lauren E.; Cooperberg, Matthew R.; Kane, Christopher J.; Aronson, William J.; Terris, Martha K.; Amling, Christopher L.; Freedland, Stephen J.

    2014-01-01

    Objective • To investigate the effect of postoperative statin use on biochemical recurrence (BCR) in PC patients treated with radical prostatectomy (RP) who never used statins before surgery. Patients and Methods • We conducted a retrospective analysis of 1,146 RP patients within the Shared Equal Access Regional Cancer Hospital (SEARCH) database. • Multivariable Cox proportional hazards analyses were used to examine differences in risk of BCR between postoperative statin users versus nonusers. • To account for varying start dates and duration of statin use during follow-up, postoperative statin use was treated as a time-dependent variable. • In secondary analysis, models were stratified by race to examine the association of postoperative statin use with BCR among black and non-black men. Results • After adjusting for clinical and pathological characteristics, postoperative statin use was significantly associated with 36% reduced risk of BCR (HR 0.64; 95%CI 0.47-0.87; p=0.004). • Postoperative statin use remained associated with reduced risk of BCR after adjusting for preoperative serum cholesterol levels. • In secondary analysis, following stratification by race, this protective association was significant in non-black (HR 0.49; 95%CI 0.32-0.75; p=0.001) but not black men (HR 0.82; 95%CI 0.53-1.28; p=0.384). Conclusion • In this retrospective cohort of men undergoing RP, postoperative statin use was significantly associated with reduced risk of BCR. • Whether the association between postoperative statin use and BCR differs by race requires further study. • Given these findings, coupled with other studies suggesting that statins may reduce risk of advanced PC, randomized controlled trials are warranted to formally test the hypothesis that statins slow PC progression. PMID:24588774

  17. Searching for first-degree familial relationships in California's offender DNA database: validation of a likelihood ratio-based approach.

    PubMed

    Myers, Steven P; Timken, Mark D; Piucci, Matthew L; Sims, Gary A; Greenwald, Michael A; Weigand, James J; Konzak, Kenneth C; Buoncristiani, Martin R

    2011-11-01

    A validation study was performed to measure the effectiveness of using a likelihood ratio-based approach to search for possible first-degree familial relationships (full-sibling and parent-child) by comparing an evidence autosomal short tandem repeat (STR) profile to California's ∼1,000,000-profile State DNA Index System (SDIS) database. Test searches used autosomal STR and Y-STR profiles generated for 100 artificial test families. When the test sample and the first-degree relative in the database were characterized at the 15 Identifiler(®) (Applied Biosystems(®), Foster City, CA) STR loci, the search procedure included 96% of the fathers and 72% of the full-siblings. When the relative profile was limited to the 13 Combined DNA Index System (CODIS) core loci, the search procedure included 93% of the fathers and 61% of the full-siblings. These results, combined with those of functional tests using three real families, support the effectiveness of this tool. Based upon these results, the validated approach was implemented as a key, pragmatic and demonstrably practical component of the California Department of Justice's Familial Search Program. An investigative lead created through this process recently led to an arrest in the Los Angeles Grim Sleeper serial murders.

  18. Comparative Recall and Precision of Simple and Expert Searches in Google Scholar and Eight Other Databases

    ERIC Educational Resources Information Center

    Walters, William H.

    2011-01-01

    This study evaluates the effectiveness of simple and expert searches in Google Scholar (GS), EconLit, GEOBASE, PAIS, POPLINE, PubMed, Social Sciences Citation Index, Social Sciences Full Text, and Sociological Abstracts. It assesses the recall and precision of 32 searches in the field of later-life migration: nine simple keyword searches and 23…

  19. Millennial Students' Mental Models of Search: Implications for Academic Librarians and Database Developers

    ERIC Educational Resources Information Center

    Holman, Lucy

    2011-01-01

    Today's students exhibit generational differences in the way they search for information. Observations of first-year students revealed a proclivity for simple keyword or phrases searches with frequent misspellings and incorrect logic. Although no students had strong mental models of search mechanisms, those with stronger models did construct more…

  20. Millennial Students' Mental Models of Search: Implications for Academic Librarians and Database Developers

    ERIC Educational Resources Information Center

    Holman, Lucy

    2011-01-01

    Today's students exhibit generational differences in the way they search for information. Observations of first-year students revealed a proclivity for simple keyword or phrases searches with frequent misspellings and incorrect logic. Although no students had strong mental models of search mechanisms, those with stronger models did construct more…

  1. Comparative Recall and Precision of Simple and Expert Searches in Google Scholar and Eight Other Databases

    ERIC Educational Resources Information Center

    Walters, William H.

    2011-01-01

    This study evaluates the effectiveness of simple and expert searches in Google Scholar (GS), EconLit, GEOBASE, PAIS, POPLINE, PubMed, Social Sciences Citation Index, Social Sciences Full Text, and Sociological Abstracts. It assesses the recall and precision of 32 searches in the field of later-life migration: nine simple keyword searches and 23…

  2. HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks

    PubMed Central

    Dai, Xinbin; Li, Jun; Liu, Tingsong; Zhao, Patrick Xuechun

    2016-01-01

    The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many ‘unknown’ yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/. PMID:26657893

  3. A continued search for transient events in the COBE DMR database simultaneous with cosmic gamma-ray bursts

    NASA Astrophysics Data System (ADS)

    Stacy, J. Gregory; Jackson, Peter D.; Bontekoe, Tj. Romke; Winkler, Christoph

    1996-08-01

    We report on the status of our ongoing project to search the database of the COBE Differential Microwave Radiometer (DMR) experiment for transient signals at microwave wavelengths simultaneous with cosmic gamma-ray bursts (GRBs). To date we have carried out a complete search of the DMR database using burst positions taken from the original BATSE 1B catalog for the eight-month period of overlap (May-December 1991) corresponding to the first public release of COBE data. We are currently repeating our original search of the COBE DMR database using the revised burst positions of the newly-released BATSE 3B catalog. Using BATSE 1B positions, at least two apparent simultaneous observations of GRBs by the COBE DMR occurred in 1991, along with a number of ``near misses'' within 30 seconds in time. At present, only upper limits to burst microwave emission are indicated. Even in the event of a non-detection of a GRB by the COBE DMR, unprecedented observational limits will still be obtained, constraining the predictions of the many theoretical models proposed to explain the origin of GRBs.

  4. Feasibility of LC/TOFMS and elemental database searching as a spectral library for pesticides in food.

    PubMed

    Thurman, E Michael; Ferrer, Imma; Malato, Octavio; Fernández-Alba, Amadeo Rodriguez

    2006-11-01

    Traditionally, the screening of unknown pesticides in food has been accomplished by GC/MS methods using conventional library-searching routines. However, many of the new polar and thermally labile pesticides are more readily and easily analysed by LC/MS methods and no searchable libraries currently exist (with the exception of some user libraries, which are limited). Therefore, there is a need for LC/MS libraries that can detect pesticides and their degradation products. This paper reports an identification scheme using a combination of LC/MS time-of-flight (accurate mass) and an Access database of 350 pesticides that are amenable to positive ion electrospray. The approach differs from conventional library searching of fragment ions. The concept consists of three parts: (1) initial screening of possible pesticides in actual market-place fruit extracts (apple and orange) using accurate mass and generating an accurate mass via an automatic ion-extraction routine, (2) searching the Access database manually for screening identification of a pesticide, and (3) identification of the suspected compound by accurate mass of at least one fragment ion and comparison of retention time with an actual standard. Imazalil and iprodione were identified in apples and thiabendazole in oranges using this database approach.

  5. Digital cloning: identification of human cDNAs homologous to novel kinases through expressed sequence tag database searching.

    PubMed

    Chen, H C; Kung, H J; Robinson, D

    1998-01-01

    Identification of novel kinases based on their sequence conservation within kinase catalytic domain has relied so far on two major approaches, low-stringency hybridization of cDNA libraries, and PCR method using degenerate primers. Both of these approaches at times are technically difficult and time-consuming. We have developed a procedure that can significantly reduce the time and effort involved in searching for novel kinases and increase the sensitivity of the analysis. This procedure exploits the computer analysis of a vast resource of human cDNA sequences represented in the expressed sequence tag (EST) database. Seventeen novel human cDNA clones showing significant homology to serine/threonine kinases, including STE-20, CDK- and YAK-related family kinases, were identified by searching EST database. Further sequence analysis of these novel kinases obtained either directly from EST clones or from PCR-RACE products confirmed their identity as protein kinases. Given the rapid accumulation of the EST database and the advent of powerful computer analysis software, this approach provides a fast, sensitive, and economical way to identify novel kinases as well as other genes from EST database.

  6. Metformin does not affect risk of biochemical recurrence following radical prostatectomy: results from the SEARCH database

    PubMed Central

    Allott, Emma H.; Abern, Michael R.; Gerber, Leah; Keto, Christopher J.; Aronson, William J.; Terris, Martha K.; Kane, Christopher J.; Amling, Christopher L.; Cooperberg, Matthew R.; Moorman, Patricia G.; Freedland, Stephen J.

    2013-01-01

    Background While epidemiologic studies suggest that metformin use among diabetics may decrease prostate cancer (PC) incidence, the effect of metformin use on PC outcome is unclear. We investigated the association between pre-operative metformin use, dose and duration of use and biochemical recurrence (BCR) in PC patients with diabetes who underwent radical prostatectomy (RP). Methods We conducted a retrospective cohort analysis within the Shared Equal Access Regional Cancer Hospital (SEARCH) database of 371 PC patients with diabetes who underwent RP. Time to BCR between metformin users and non-users, and by metformin dose and duration of use was assessed using multivariable Cox proportional analysis adjusted for demographic, clinical and/or pathologic features. Time to castrate-resistant PC (CPRC), metastases and PC-specific mortality were explored as secondary outcomes using unadjusted analyses. Results Of 371 diabetic men, 156 (42%) were using metformin prior to RP. Metformin use was associated with more recent year of surgery (p<0.0001) but no clinical or pathologic characteristics. After adjustment for year of surgery, clinical and pathologic features, there were no associations between metformin use (HR 0.93; 95%CI 0.61–1.41), high metformin dose (HR 0.96; 95%CI 0.57–1.61) or duration of use (HR 1.00; 95%CI 0.99–1.02) and time to BCR. A total of 14 patients (3.8%) developed CRPC, 10 (2.7%) distant metastases and 8 (2.2%) died from PC. Unadjusted analysis suggested high metformin dose versus non-use was associated with increased risk of CRPC (HR 5.1; 95%CI 1.6–16.5), metastases (HR 4.8; 95%CI 1.2–18.5) and PC-specific mortality (HR 5.0; 95%CI 1.1–22.5). Conclusions Metformin use, dose or duration of use was not associated with BCR in this cohort of diabetic PC patients treated with RP. The suggestion that higher metformin dose was associated with increased risk of CPRC, metastases and PC-specific mortality merits testing in large prospective studies

  7. Effect of cleavage enzyme, search algorithm and decoy database on mass spectrometric identification of wheat gluten proteins.

    PubMed

    Vensel, William H; Dupont, Frances M; Sloane, Stacia; Altenbach, Susan B

    2011-07-01

    While tandem mass spectrometry (MS/MS) is routinely used to identify proteins from complex mixtures, certain types of proteins present unique challenges for MS/MS analyses. The major wheat gluten proteins, gliadins and glutenins, are particularly difficult to distinguish by MS/MS. Each of these groups contains many individual proteins with similar sequences that include repetitive motifs rich in proline and glutamine. These proteins have few cleavable tryptic sites, often resulting in only one or two tryptic peptides that may not provide sufficient information for identification. Additionally, there are less than 14,000 complete protein sequences from wheat in the current NCBInr release. In this paper, MS/MS methods were optimized for the identification of the wheat gluten proteins. Chymotrypsin and thermolysin as well as trypsin were used to digest the proteins and the collision energy was adjusted to improve fragmentation of chymotryptic and thermolytic peptides. Specialized databases were constructed that included protein sequences derived from contigs from several assemblies of wheat expressed sequence tags (ESTs), including contigs assembled from ESTs of the cultivar under study. Two different search algorithms were used to interrogate the database and the results were analyzed and displayed using a commercially available software package (Scaffold). We examined the effect of protein database content and size on the false discovery rate. We found that as database size increased above 30,000 sequences there was a decrease in the number of proteins identified. Also, the type of decoy database influenced the number of proteins identified. Using three enzymes, two search algorithms and a specialized database allowed us to greatly increase the number of detected peptides and distinguish proteins within each gluten protein group.

  8. Param-Medic: A Tool for Improving MS/MS Database Search Yield by Optimizing Parameter Settings.

    PubMed

    May, Damon H; Tamura, Kaipo; Noble, William S

    2017-03-13

    In shotgun proteomics analysis, user-specified parameters are critical to database search performance and therefore to the yield of confident peptide-spectrum matches (PSMs). Two of the most important parameters are related to the accuracy of the mass spectrometer. Precursor mass tolerance defines the peptide candidates considered for each spectrum. Fragment mass tolerance or bin size determines how close observed and theoretical fragments must be to be considered a match. For either of these two parameters, too wide a setting yields randomly high-scoring false PSMs, whereas too narrow a setting erroneously excludes true PSMs, in both cases, lowering the yield of peptides detected at a given false discovery rate. We describe a strategy for inferring optimal search parameters by assembling and analyzing pairs of spectra that are likely to have been generated by the same peptide ion to infer precursor and fragment mass error. This strategy does not rely on a database search, making it usable in a wide variety of settings. In our experiments on data from a variety of instruments including Orbitrap and Q-TOF acquisitions, this strategy yields more high-confidence PSMs than using settings based on instrument defaults or determined by experts. Param-Medic is open-source and cross-platform. It is available as a standalone tool ( http://noble.gs.washington.edu/proj/param-medic/ ) and has been integrated into the Crux proteomics toolkit ( http://crux.ms ), providing automatic parameter selection for the Comet and Tide search engines.

  9. Near-optimal quantum circuit for Grover's unstructured search using a transverse field

    NASA Astrophysics Data System (ADS)

    Jiang, Zhang; Rieffel, Eleanor G.; Wang, Zhihui

    2017-06-01

    Inspired by a class of algorithms proposed by Farhi et al. (arXiv:1411.4028), namely, the quantum approximate optimization algorithm (QAOA), we present a circuit-based quantum algorithm to search for a needle in a haystack, obtaining the same quadratic speedup achieved by Grover's original algorithm. In our algorithm, the problem Hamiltonian (oracle) and a transverse field are applied alternately to the system in a periodic manner. We introduce a technique, based on spin-coherent states, to analyze the composite unitary in a single period. This composite unitary drives a closed transition between two states that have high degrees of overlap with the initial state and the target state, respectively. The transition rate in our algorithm is of order Θ (1 /√{N }) , and the overlaps are of order Θ (1 ) , yielding a nearly optimal query complexity of T ≃√{N }(π /2 √{2 }) . Our algorithm is a QAOA circuit that demonstrates a quantum advantage with a large number of iterations that is not derived from Trotterization of an adiabatic quantum optimization (AQO) algorithm. It also suggests that the analysis required to understand QAOA circuits involves a very different process from estimating the energy gap of a Hamiltonian in AQO.

  10. Dynamics of the quantum search and quench-induced first-order phase transitions.

    PubMed

    Coulamy, Ivan B; Saguia, Andreia; Sarandy, Marcelo S

    2017-02-01

    We investigate the excitation dynamics at a first-order quantum phase transition (QPT). More specifically, we consider the quench-induced QPT in the quantum search algorithm, which aims at finding out a marked element in an unstructured list. We begin by deriving the exact dynamics of the model, which is shown to obey a Riccati differential equation. Then, we discuss the probabilities of success by adopting either global or local adiabaticity strategies. Moreover, we determine the disturbance of the quantum criticality as a function of the system size. In particular, we show that the critical point exponentially converges to its thermodynamic limit even in a fast evolution regime, which is characterized by both entanglement QPT estimators and the Schmidt gap. The excitation pattern is manifested in terms of quantum domain walls separated by kinks. The kink density is then shown to follow an exponential scaling as a function of the evolution speed, which can be interpreted as a Kibble-Zurek mechanism for first-order QPTs.

  11. Dynamics of the quantum search and quench-induced first-order phase transitions

    NASA Astrophysics Data System (ADS)

    Coulamy, Ivan B.; Saguia, Andreia; Sarandy, Marcelo S.

    2017-02-01

    We investigate the excitation dynamics at a first-order quantum phase transition (QPT). More specifically, we consider the quench-induced QPT in the quantum search algorithm, which aims at finding out a marked element in an unstructured list. We begin by deriving the exact dynamics of the model, which is shown to obey a Riccati differential equation. Then, we discuss the probabilities of success by adopting either global or local adiabaticity strategies. Moreover, we determine the disturbance of the quantum criticality as a function of the system size. In particular, we show that the critical point exponentially converges to its thermodynamic limit even in a fast evolution regime, which is characterized by both entanglement QPT estimators and the Schmidt gap. The excitation pattern is manifested in terms of quantum domain walls separated by kinks. The kink density is then shown to follow an exponential scaling as a function of the evolution speed, which can be interpreted as a Kibble-Zurek mechanism for first-order QPTs.

  12. GHOSTX: an improved sequence homology search algorithm using a query suffix array and a database suffix array.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2014-01-01

    DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131-165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.

  13. Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD)

    NASA Astrophysics Data System (ADS)

    Saal, James E.; Kirklin, Scott; Aykol, Muratahan; Meredig, Bryce; Wolverton, C.

    2013-11-01

    High-throughput density functional theory (HT DFT) is fast becoming a powerful tool for accelerating materials design and discovery by the amassing tens and even hundreds of thousands of DFT calculations in large databases. Complex materials problems can be approached much more efficiently and broadly through the sheer quantity of structures and chemistries available in such databases. Our HT DFT database, the Open Quantum Materials Database (OQMD), contains over 200,000 DFT calculated crystal structures and will be freely available for public use at http://oqmd.org. In this review, we describe the OQMD and its use in five materials problems, spanning a wide range of applications and materials types: (I) Li-air battery combination catalyst/electrodes, (II) Li-ion battery anodes, (III) Li-ion battery cathode coatings reactive with HF, (IV) Mg-alloy long-period stacking ordered (LPSO) strengthening precipitates, and (V) training a machine learning model to predict new stable ternary compounds.

  14. A Multivariate Mixture Model to Estimate the Accuracy of Glycosaminoglycan Identifications Made by Tandem Mass Spectrometry (MS/MS) and Database Search.

    PubMed

    Chiu, Yulun; Schliekelman, Paul; Orlando, Ron; Sharp, Joshua S

    2017-02-01

    We present a statistical model to estimate the accuracy of derivatized heparin and heparan sulfate (HS) glycosaminoglycan (GAG) assignments to tandem mass (MS/MS) spectra made by the first published database search application, GAG-ID. Employing a multivariate expectation-maximization algorithm, this statistical model distinguishes correct from ambiguous and incorrect database search results when computing the probability that heparin/HS GAG assignments to spectra are correct based upon database search scores. Using GAG-ID search results for spectra generated from a defined mixture of 21 synthesized tetrasaccharide sequences as well as seven spectra of longer defined oligosaccharides, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly, ambiguously, and incorrectly assigned heparin/HS GAGs. This analysis makes it possible to filter large MS/MS database search results with predictable false identification error rates.

  15. Review and Comparison of the Search Effectiveness and User Interface of Three Major Online Chemical Databases

    ERIC Educational Resources Information Center

    Bharti, Neelam; Leonard, Michelle; Singh, Shailendra

    2016-01-01

    Online chemical databases are the largest source of chemical information and, therefore, the main resource for retrieving results from published journals, books, patents, conference abstracts, and other relevant sources. Various commercial, as well as free, chemical databases are available. SciFinder, Reaxys, and Web of Science are three major…

  16. Review and Comparison of the Search Effectiveness and User Interface of Three Major Online Chemical Databases

    ERIC Educational Resources Information Center

    Bharti, Neelam; Leonard, Michelle; Singh, Shailendra

    2016-01-01

    Online chemical databases are the largest source of chemical information and, therefore, the main resource for retrieving results from published journals, books, patents, conference abstracts, and other relevant sources. Various commercial, as well as free, chemical databases are available. SciFinder, Reaxys, and Web of Science are three major…

  17. Reach for Reference. Don't Judge a Database by Its Search Screen

    ERIC Educational Resources Information Center

    Safford, Barbara Ripp

    2005-01-01

    In this column, the author provides a description and brief review of the "Children's Literature Comprehensive Database" (CLCD). This subscription database is a 1999 spinoff from Marilyn Courtot's "Children's Literature" website, which began in 1993 and is a free resource of reviews and features about books, authors, and illustrators. The separate…

  18. Searching for quantum gravity with high-energy atmospheric neutrinos and AMANDA-II

    NASA Astrophysics Data System (ADS)

    Kelley, John Lawrence

    2008-06-01

    The AMANDA-II detector, operating since 2000 in the deep ice at the geographic South Pole, has accumulated a large sample of atmospheric muon neutrinos in the 100 GeV to 10 TeV energy range. The zenith angle and energy distribution of these events can be used to search for various phenomenological signatures of quantum gravity in the neutrino sector, such as violation of Lorentz invariance (VLI) or quantum decoherence (QD). Analyzing a set of 5511 candidate neutrino events collected during 1387 days of livetime from 2000 to 2006, we find no evidence for such effects and set upper limits on VLI and QD parameters using a maximum likelihood method. Given the absence of new flavor-changing physics, we use the same methodology to determine the conventional atmospheric muon neutrino flux above 100 GeV.

  19. Low template STR typing: effect of replicate number and consensus method on genotyping reliability and DNA database search results.

    PubMed

    Benschop, Corina C G; van der Beek, Cornelis P; Meiland, Hugo C; van Gorp, Ankie G M; Westen, Antoinette A; Sijen, Titia

    2011-08-01

    To analyze DNA samples with very low DNA concentrations, various methods have been developed that sensitize short tandem repeat (STR) typing. Sensitized DNA typing is accompanied by stochastic amplification effects, such as allele drop-outs and drop-ins. Therefore low template (LT) DNA profiles are interpreted with care. One can either try to infer the genotype by a consensus method that uses alleles confirmed in replicate analyses, or one can use a statistical model to evaluate the strength of the evidence in a direct comparison with a known DNA profile. In this study we focused on the first strategy and we show that the procedure by which the consensus profile is assembled will affect genotyping reliability. In order to gain insight in the roles of replicate number and requested level of reproducibility, we generated six independent amplifications of samples of known donors. The LT methods included both increased cycling and enhanced capillary electrophoresis (CE) injection [1]. Consensus profiles were assembled from two to six of the replications using four methods: composite (include all alleles), n-1 (include alleles detected in all but one replicate), n/2 (include alleles detected in at least half of the replicates) and 2× (include alleles detected twice). We compared the consensus DNA profiles with the DNA profile of the known donor, studied the stochastic amplification effects and examined the effect of the consensus procedure on DNA database search results. From all these analyses we conclude that the accuracy of LT DNA typing and the efficiency of database searching improve when the number of replicates is increased and the consensus method is n/2. The most functional number of replicates within this n/2 method is four (although a replicate number of three suffices for samples showing >25% of the alleles in standard STR typing). This approach was also the optimal strategy for the analysis of 2-person mixtures, although modified search strategies may be

  20. Boolean Logic: An Aid for Searching Computer Databases in Special Education and Rehabilitation.

    ERIC Educational Resources Information Center

    Summers, Edward G.

    1989-01-01

    The article discusses using Boolean logic as a tool for searching computerized information retrieval systems in special education and rehabilitation technology. It includes discussion of the Boolean search operators AND, OR, and NOT; Venn diagrams; and disambiguating parentheses. Six suggestions are offered for development of good Boolean logic…

  1. Closing the loop in cortically-coupled computer vision: a brain-computer interface for searching image databases.

    PubMed

    Pohlmeyer, Eric A; Wang, Jun; Jangraw, David C; Lou, Bin; Chang, Shih-Fu; Sajda, Paul

    2011-06-01

    We describe a closed-loop brain-computer interface that re-ranks an image database by iterating between user generated 'interest' scores and computer vision generated visual similarity measures. The interest scores are based on decoding the electroencephalographic (EEG) correlates of target detection, attentional shifts and self-monitoring processes, which result from the user paying attention to target images interspersed in rapid serial visual presentation (RSVP) sequences. The highest scored images are passed to a semi-supervised computer vision system that reorganizes the image database accordingly, using a graph-based representation that captures visual similarity between images. The system can either query the user for more information, by adaptively resampling the database to create additional RSVP sequences, or it can converge to a 'done' state. The done state includes a final ranking of the image database and also a 'guess' of the user's chosen category of interest. We find that the closed-loop system's re-rankings can substantially expedite database searches for target image categories chosen by the subjects. Furthermore, better reorganizations are achieved than by relying on EEG interest rankings alone, or if the system were simply run in an open loop format without adaptive resampling.

  2. Lead generation using pharmacophore mapping and three-dimensional database searching: application to muscarinic M(3) receptor antagonists.

    PubMed

    Marriott, D P; Dougall, I G; Meghani, P; Liu, Y J; Flower, D R

    1999-08-26

    By using a pharmacophore model, a geometrical representation of the features necessary for molecules to show a particular biological activity, it is possible to search databases containing the 3D structures of molecules and identify novel compounds which may possess this activity. We describe our experiences of establishing a working 3D database system and its use in rational drug design. By using muscarinic M(3) receptor antagonists as an example, we show that it is possible to identify potent novel lead compounds using this approach. Pharmacophore generation based on the structures of known M(3) receptor antagonists, 3D database searching, and medium-throughput screening were used to identify candidate compounds. Three compounds were chosen to define the pharmacophore: a lung-selective M(3) antagonist patented by Pfizer and two Astra compounds which show affinity at the M(3) receptor. From these, a pharmacophore model was generated, using the program DISCO, and this was used subsequently to search a UNITY 3D database of proprietary compounds; 172 compounds were found to fit the pharmacophore. These compounds were then screened, and 1-[2-(2-(diethylamino)ethoxy)phenyl]-2-phenylethanone (pA(2) 6.67) was identified as the best hit, with N-[2-(piperidin-1-ylmethyl)cycohexyl]-2-propoxybenz amide (pA(2) 4. 83) and phenylcarbamic acid 2-(morpholin-4-ylmethyl)cyclohexyl ester (pA(2) 5.54) demonstrating lower activity. As well as its potency, 1-[2-(2-(diethylamino)ethoxy)phenyl]-2-phenylethanone is a simple structure with limited similarity to existing M(3) receptor antagonists.

  3. High-performance hardware implementation of a parallel database search engine for real-time peptide mass fingerprinting

    PubMed Central

    Bogdán, István A.; Rivers, Jenny; Beynon, Robert J.; Coca, Daniel

    2008-01-01

    Motivation: Peptide mass fingerprinting (PMF) is a method for protein identification in which a protein is fragmented by a defined cleavage protocol (usually proteolysis with trypsin), and the masses of these products constitute a ‘fingerprint’ that can be searched against theoretical fingerprints of all known proteins. In the first stage of PMF, the raw mass spectrometric data are processed to generate a peptide mass list. In the second stage this protein fingerprint is used to search a database of known proteins for the best protein match. Although current software solutions can typically deliver a match in a relatively short time, a system that can find a match in real time could change the way in which PMF is deployed and presented. In a paper published earlier we presented a hardware design of a raw mass spectra processor that, when implemented in Field Programmable Gate Array (FPGA) hardware, achieves almost 170-fold speed gain relative to a conventional software implementation running on a dual processor server. In this article we present a complementary hardware realization of a parallel database search engine that, when running on a Xilinx Virtex 2 FPGA at 100 MHz, delivers 1800-fold speed-up compared with an equivalent C software routine, running on a 3.06 GHz Xeon workstation. The inherent scalability of the design means that processing speed can be multiplied by deploying the design on multiple FPGAs. The database search processor and the mass spectra processor, running on a reconfigurable computing platform, provide a complete real-time PMF protein identification solution. Contact: d.coca@sheffield.ac.uk PMID:18453553

  4. Searches for Decaying Sterile Neutrinos with the X-Ray Quantum Calorimeter Sounding Rocket

    NASA Astrophysics Data System (ADS)

    Goldfinger, David; XQC Collaboration

    2016-01-01

    Rocket borne X-ray spectrometers can produce high-resolution spectra for wide field-of-view observations. This is useful in searches for dark matter candidates that produce X-ray lines in the Milky Way, such as decaying keV scale sterile neutrinos. In spite of exposure times and effective areas that are significantly smaller than satellite observatories, similar sensitivity to decaying sterile neutrinos can be attained due to the high spectral resolution and large field of view. We present recent results of such a search analyzing the telemetered data from the 2011 flight of the X-Ray Quantum Colorimeter instrument as well as ongoing progress in expanding the data set to include the more complete onboard data over additional flights.

  5. Published and perished? The influence of the searched protein database on the long-term storage of proteomics data.

    PubMed

    Griss, Johannes; Côté, Richard G; Gerner, Christopher; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2011-09-01

    In proteomics, protein identifications are reported and stored using an unstable reference system: protein identifiers. These proprietary identifiers are created individually by every protein database and can change or may even be deleted over time. To estimate the effect of the searched protein sequence database on the long-term storage of proteomics data we analyzed the changes of reported protein identifiers from all public experiments in the Proteomics Identifications (PRIDE) database by November 2010. To map the submitted protein identifier to a currently active entry, two distinct approaches were used. The first approach used the Protein Identifier Cross Referencing (PICR) service at the EBI, which maps protein identifiers based on 100% sequence identity. The second one (called logical mapping algorithm) accessed the source databases and retrieved the current status of the reported identifier. Our analysis showed the differences between the main protein databases (International Protein Index (IPI), UniProt Knowledgebase (UniProtKB), National Center for Biotechnological Information nr database (NCBI nr), and Ensembl) in respect to identifier stability. For example, whereas 20% of submitted IPI entries were deleted after two years, virtually all UniProtKB entries remained either active or replaced. Furthermore, the two mapping algorithms produced markedly different results. For example, the PICR service reported 10% more IPI entries deleted compared with the logical mapping algorithm. We found several cases where experiments contained more than 10% deleted identifiers already at the time of publication. We also assessed the proportion of peptide identifications in these data sets that still fitted the originally identified protein sequences. Finally, we performed the same overall analysis on all records from IPI, Ensembl, and UniProtKB: two releases per year were used, from 2005. This analysis showed for the first time the true effect of changing protein

  6. Supervised learning of tools for content-based search of image databases

    NASA Astrophysics Data System (ADS)

    Delanoy, Richard L.

    1996-03-01

    A computer environment, called the Toolkit for Image Mining (TIM), is being developed with the goal of enabling users with diverse interests and varied computer skills to create search tools for content-based image retrieval and other pattern matching tasks. Search tools are generated using a simple paradigm of supervised learning that is based on the user pointing at mistakes of classification made by the current search tool. As mistakes are identified, a learning algorithm uses the identified mistakes to build up a model of the user's intentions, construct a new search tool, apply the search tool to a test image, display the match results as feedback to the user, and accept new inputs from the user. Search tools are constructed in the form of functional templates, which are generalized matched filters capable of knowledge- based image processing. The ability of this system to learn the user's intentions from experience contrasts with other existing approaches to content-based image retrieval that base searches on the characteristics of a single input example or on a predefined and semantically- constrained textual query. Currently, TIM is capable of learning spectral and textural patterns, but should be adaptable to the learning of shapes, as well. Possible applications of TIM include not only content-based image retrieval, but also quantitative image analysis, the generation of metadata for annotating images, data prioritization or data reduction in bandwidth-limited situations, and the construction of components for larger, more complex computer vision algorithms.

  7. External validation of the SEARCH model for predicting aggressive recurrence after radical prostatectomy: results from the Duke Prostate Center Database

    PubMed Central

    Teeter, Anna E.; Sun, Leon; Moul, Judd W.; Freedland, Stephen J.

    2010-01-01

    Objective To validate a model previously developed using the Shared Equal Access Regional Cancer Hospital (SEARCH) database to predict the risk of aggressive recurrence after surgery, defined as a prostate-specific antigen (PSA) doubling time (DT) of < 9 months, incorporating pathological stage, preoperative PSA level and pathological Gleason sum, that had an area under the curve (AUC) of 0.79 using a cohort of men from the Duke Prostate Center (DPC). Patients and methods Data were included from 1989 men from the DPC database who underwent RP for node-negative prostate cancer between 1987 and 2003. Of these men, 100 had disease recurrence, with a PSADT of < 9 months, while 1889 either did not have a recurrence but had ≥36 months of follow-up or had a recurrence with a PSADT of ≥ 9 months. We examined the ability of the SEARCH model to predict aggressive recurrence within the DPC cohort, and examined the correlation between the predicted risk of aggressive recurrence and the actual outcome within DPC. Results The SEARCH model predicted aggressive recurrence within DPC with an AUC of 0.82. There was a strong and significant correlation between the predicted risk of aggressive recurrence based on the SEARCH tables and the actual outcomes within DPC (r = 0.68, P < 0.001), although the model predictions tended to be slightly higher than the actual risk. Conclusions The SEARCH model to predict aggressive recurrence after RP predicted aggressive recurrence in an external dataset with a high degree of accuracy. These tables, now validated, can be used to help select men for adjuvant therapy and clinical trials. PMID:20151967

  8. 3DinSight: an integrated relational database and search tool for the structure, function and properties of biomolecules.

    PubMed

    An, J; Nakama, T; Kubota, Y; Sarai, A

    1998-01-01

    Although a large amount of information on the structure, function and properties of biomolecules is becoming available, it is difficult to understand the relationship between them. Thus, we have attempted to create an integrated relational database, search and visualization tool, 3DinSight, to help researchers to gain insight into their relationship. We have gathered data on the structure, function and properties of biomolecules, and implemented them into a relational database system. The structural data contain several subset data such as protein homologues, protein-DNA complex, in order to enable searching within a specific class of data. The functional data include motif sequence and mutation data of proteins. Also, various amino acid properties are implemented as a relational table. The World Wide Web (WWW) interfaces enable users to carry out various kinds of searches among these data. The locations of motif sequences and mutations are automatically mapped on the structure, and visualized in three-dimensional (3D) space by interactive viewers, VRML (Virtual Reality Modeling Language) and RasMol. In the case of VRML, the mapped 3D objects are hyper-linked to the corresponding document data. Also, amino acid properties, linked with structure, functional and mutation sites, can be displayed as graph plots. 3DinSight is freely accessible through the Internet (http://www.rtc.riken.go.jp/3DinSight.h tml). sarai@rtc.riken.go.jp

  9. Federated Search Tools in Fusion Centers: Bridging Databases in the Information Sharing Environment

    DTIC Science & Technology

    2012-09-01

    German and Jay Stanley , “Fusion Center Update,” American Civil Liberties Union, July 2008, http://www.aclu.org/files/pdfs/privacy...Intelligence, ed. Jennifer E. Sims and Burton Gerber (Washington DC: Georgetown University Press, 2005), 107. 16 Ibid. 11 through a federated search tool...SurveyMonkey. Last modified June 23, 2012. https://www.surveymonkey.com/s/FederatedSearchToolsinFCs. German, Mike and Jay Stanley . “Fusion Center

  10. Silicon photonic chips for search on improved-glued-binary-tree based on continuous-time quantum walk

    NASA Astrophysics Data System (ADS)

    Qi, Fan; Ma, Qingyan; Wang, Yufei; Zheng, Wanhua

    2016-11-01

    Search on improved-glued-binary-trees is a representative example of quantum superiority, where exponential acceleration can be achieved using quantum walk with respect to any classical algorithms. Here we analyzed the evolution process of this quantum-walk-based algorithm. Several remarkable features of the process are revealed. Generation of the model by introducing tunable defect strength and double defects is also discussed and the effects of these generalization on evolution process, arrival probability and residual probability are discussed in details. Physical implementation with silicon ridge waveguide array is presented. The design of the array with FEM method are presented and light propagation simulation with FDTD method shows that this kind of structure is feasible for the task. Lastly, preliminary experimental demonstration with classical coherent light simulation are presented. Our results show that silicon photonic chips are suitable for such search problems and opening a route towards large-scale photonic quantum computation.

  11. The Magnetics Information Consortium (MagIC) Online Database: Uploading, Searching and Visualizing Paleomagnetic and Rock Magnetic Data

    NASA Astrophysics Data System (ADS)

    Koppers, A.; Tauxe, L.; Constable, C.; Pisarevsky, S.; Jackson, M.; Solheid, P.; Banerjee, S.; Johnson, C.; Genevey, A.; Delaney, R.; Baker, P.; Sbarbori, E.

    2005-12-01

    The Magnetics Information Consortium (MagIC) operates an online relational database including both rock and paleomagnetic data. The goal of MagIC is to store all measurements and their derived properties for studies of paleomagnetic directions (inclination, declination) and their intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and has two search nodes, one for paleomagnetism and one for rock magnetism. These nodes provide basic search capabilities based on location, reference, methods applied, material type and geological age, while allowing the user to drill down from sites all the way to the measurements. At each stage, the data can be saved and, if the available data supports it, the data can be visualized by plotting equal area plots, VGP location maps or typical Zijderveld, hysteresis, FORC, and various magnetization and remanence diagrams. All plots are made in SVG (scalable vector graphics) and thus can be saved and easily read into the user's favorite graphics programs without loss of resolution. User contributions to the MagIC database are critical to achieve a useful research tool. We have developed a standard data and metadata template (version 1.6) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate easy population of these templates within Microsoft Excel. These tools allow for the import/export of text files and they provide advanced functionality to manage/edit the data, and to perform various internal checks to high grade the data and to make them ready for uploading. The uploading is all done online by using the MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm that takes only a few minutes to process a contribution of approximately 5,000 data records. After uploading these standardized MagIC template files will be stored in the

  12. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.

    PubMed

    Liu, Yongchao; Wirawan, Adrianto; Schmidt, Bertil

    2013-04-04

    The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. We present CUDASW++ 3.0, a fast Smith-Waterman protein database search algorithm, which couples CPU and GPU SIMD instructions and carries out concurrent CPU and GPU computations. For the CPU computation, this algorithm employs SSE-based vector execution units as accelerators. For the GPU computation, we have investigated for the first time a GPU SIMD parallelization, which employs CUDA PTX SIMD video instructions to gain more data parallelism beyond the SIMT execution model. Moreover, sequence alignment workloads are automatically distributed over CPUs and GPUs based on their respective compute capabilities. Evaluation on the Swiss-Prot database shows that CUDASW++ 3.0 gains a performance improvement over CUDASW++ 2.0 up to 2.9 and 3.2, with a maximum performance of 119.0 and 185.6 GCUPS, on a single-GPU GeForce GTX 680 and a dual-GPU GeForce GTX 690 graphics card, respectively. In addition, our algorithm has demonstrated significant speedups over other top-performing tools: SWIPE and BLAST+. CUDASW++ 3.0 is written in CUDA C++ and PTX assembly languages, targeting GPUs based on the Kepler architecture. This algorithm obtains significant speedups over its predecessor: CUDASW++ 2.0, by benefiting from the use of CPU and GPU SIMD instructions as well as the concurrent execution on CPUs and GPUs. The source code and the simulated data are available at http://cudasw.sourceforge.net.

  13. Code optimization of the subroutine to remove near identical matches in the sequence database homology search tool PSI-BLAST.

    PubMed

    Aspnäs, Mats; Mattila, Kimmo; Osowski, Kristoffer; Westerholm, Jan

    2010-06-01

    A central task in protein sequence characterization is the use of a sequence database homology search tool to find similar protein sequences in other individuals or species. PSI-BLAST is a widely used module of the BLAST package that calculates a position-specific score matrix from the best matching sequences and performs iterated searches using a method to avoid many similar sequences for the score. For some queries and parameter settings, PSI-BLAST may find many similar high-scoring matches, and therefore up to 80% of the total run time may be spent in this procedure. In this article, we present code optimizations that improve the cache utilization and the overall performance of this procedure. Measurements show that, for queries where the number of similar matches is high, the optimized PSI-BLAST program may be as much as 2.9 times faster than the original program.

  14. Combining history of medicine and library instruction: an innovative approach to teaching database searching to medical students.

    PubMed

    Timm, Donna F; Jones, Dee; Woodson, Deidra; Cyrus, John W

    2012-01-01

    Library faculty members at the Health Sciences Library at the LSU Health Shreveport campus offer a database searching class for third-year medical students during their surgery rotation. For a number of years, students completed "ten-minute clinical challenges," but the instructors decided to replace the clinical challenges with innovative exercises using The Edwin Smith Surgical Papyrus to emphasize concepts learned. The Surgical Papyrus is an online resource that is part of the National Library of Medicine's "Turning the Pages" digital initiative. In addition, vintage surgical instruments and historic books are displayed in the classroom to enhance the learning experience.

  15. Online Searching of Bibliographic Databases: Microcomputer Access to National Information Systems.

    ERIC Educational Resources Information Center

    Coons, Bill

    This paper describes the range and scope of various information databases available for technicians, researchers, and managers employed in forestry and the forest products industry. Availability of information on reports of field and laboratory research, business trends, product prices, and company profiles through national distributors of…

  16. Online/CD-ROM Bibliographic Database Searching in a Small Academic Library.

    ERIC Educational Resources Information Center

    Pitet, Lynn T.

    The purpose of the project described in this paper was to gather information about online/CD-ROM database systems that would be useful in improving the services offered at the University of Findlay, a small private liberal arts college in northwestern Ohio. A survey was sent to 67 libraries serving colleges similar in size which included questions…

  17. Online Searching of Bibliographic Databases: Microcomputer Access to National Information Systems.

    ERIC Educational Resources Information Center

    Coons, Bill

    This paper describes the range and scope of various information databases available for technicians, researchers, and managers employed in forestry and the forest products industry. Availability of information on reports of field and laboratory research, business trends, product prices, and company profiles through national distributors of…

  18. Parallel computer architecture. (Latest citations from INSPEC - the database for Physics, Electronics, and Computing). Published Search

    SciTech Connect

    Not Available

    1993-10-01

    The bibliography contains citations concerning the development and performance analysis of parallel architecture in image processing and computing. Cost and performance evaluations of multiple processor systems are described. Applications are described, including supercomputer design, database management, computer communication systems, and robot control. (Contains 250 citations and includes a subject term index and title list.)

  19. Fast 3D molecular superposition and similarity search in databases of flexible molecules

    NASA Astrophysics Data System (ADS)

    Krämer, Andreas; Horn, Hans W.; Rice, Julia E.

    2003-01-01

    We present a new method (fFLASH) for the virtual screening of compound databases that is based on explicit three-dimensional molecular superpositions. fFLASH takes the torsional flexibility of the database molecules fully into account, and can deal with an arbitrary number of conformation-dependent molecular features. The method utilizes a fragmentation-reassembly approach which allows for an efficient sampling of the conformational space. A fast clique-based pattern matching algorithm generates alignments of pairs of adjacent molecular fragments on the rigid query molecule that are subsequently reassembled to complete database molecules. Using conventional molecular features (hydrogen bond donors and acceptors, charges, and hydrophobic groups) we show that fFLASH is able to rapidly produce accurate alignments of medium-sized drug-like molecules. Experiments with a test database containing a diverse set of 1780 drug-like molecules (including all conformers) have shown that average query processing times of the order of 0.1 seconds per molecule can be achieved on a PC.

  20. Quantum computing

    PubMed Central

    Li, Shu-Shen; Long, Gui-Lu; Bai, Feng-Shan; Feng, Song-Lin; Zheng, Hou-Zhi

    2001-01-01

    Quantum computing is a quickly growing research field. This article introduces the basic concepts of quantum computing, recent developments in quantum searching, and decoherence in a possible quantum dot realization. PMID:11562459

  1. The "Clipping Thesis": An Exercise in Developing Critical Thinking and Online Database Searching Skills.

    ERIC Educational Resources Information Center

    Minnich, Nancy P.; McCarthy, Carrol B.

    1986-01-01

    Designed to help high school students develop critical thinking and writing skills, the "Clipping Thesis" project requires students to find newspaper and journal articles on a given topic through printed indexes or online searching, read the articles, write brief and final summaries of their readings, and compile a bibliography. (EM)

  2. SCOOP: A Measurement and Database of Student Online Search Behavior and Performance

    ERIC Educational Resources Information Center

    Zhou, Mingming

    2015-01-01

    The ability to access and process massive amounts of online information is required in many learning situations. In order to develop a better understanding of student online search process especially in academic contexts, an online tool (SCOOP) is developed for tracking mouse behavior on the web to build a more extensive account of student web…

  3. Information Retrieval Strategies of Millennial Undergraduate Students in Web and Library Database Searches

    ERIC Educational Resources Information Center

    Porter, Brandi

    2009-01-01

    Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search…

  4. A Search for Nontriggered Gamma-Ray Bursts in the BATSE Database

    NASA Technical Reports Server (NTRS)

    Kommers, Jefferson M.; Lewin, Walter H. G.; Kouveliotou, Chryssa; VanParadus, Jan; Pendleton, Geoffrey N.; Meegan, Charles A.; Fishman, Gerald J.

    1997-01-01

    We describe a search of archival data from the Burst and Transient Source Experiment (BATSE). The purpose of the search is to find astronomically interesting transients that did not activate the burst-detection (or "trigger") system on board the spacecraft. Our search is sensitive to events with peak fluxes (on the 1.024 s timescale) that are lower by a factor of approximately 2 than can be detected with the on-board burst trigger. In a search of 345 days of archival data, we detected 91 events in the 50-300 keV range that resemble classical gamma-ray bursts but that did not activate the on-board burst trigger. We also detected 110 low-energy (25-50 keV) events of unknown origin that may include activity from' soft gamma repeater (SGR) 1806-20 and bursts and flares from X-ray binaries. This paper gives the occurrence times, estimated source directions, durations, peak fluxes, and fluences for the 91 gamma-ray burst candidates. The direction and intensity distributions of these bursts imply that the biases inherent in the on-board trigger mechanism have not significantly affected the completeness of the published BATSE gamma-ray burst catalogs.

  5. Indexing and Online Searching of Multi-Purpose Textual Databases: Conflict or Confluence?

    ERIC Educational Resources Information Center

    Regazzi, John J.

    The indexing and online searching of data bases have both conflicting and complementary structures, determined in part by the choice of indexing language. In the planning of a machine readable system, the Foundation Center, administrator of two data bases, initially chose a natural language for the lower cost, more rapid implementation, and…

  6. SCOOP: A Measurement and Database of Student Online Search Behavior and Performance

    ERIC Educational Resources Information Center

    Zhou, Mingming

    2015-01-01

    The ability to access and process massive amounts of online information is required in many learning situations. In order to develop a better understanding of student online search process especially in academic contexts, an online tool (SCOOP) is developed for tracking mouse behavior on the web to build a more extensive account of student web…

  7. Information Retrieval Strategies of Millennial Undergraduate Students in Web and Library Database Searches

    ERIC Educational Resources Information Center

    Porter, Brandi

    2009-01-01

    Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search…

  8. The Magnetics Information Consortium (MagIC) Online Database: Uploading, Searching and Visualizing Paleomagnetic and Rock Magnetic Data

    NASA Astrophysics Data System (ADS)

    Minnett, R.; Koppers, A.; Tauxe, L.; Constable, C.; Pisarevsky, S. A.; Jackson, M.; Solheid, P.; Banerjee, S.; Johnson, C.

    2006-12-01

    The Magnetics Information Consortium (MagIC) is commissioned to implement and maintain an online portal to a relational database populated by both rock and paleomagnetic data. The goal of MagIC is to archive all measurements and the derived properties for studies of paleomagnetic directions (inclination, declination) and intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and has two search nodes, one for paleomagnetism and one for rock magnetism. Both nodes provide query building based on location, reference, methods applied, material type and geological age, as well as a visual map interface to browse and select locations. The query result set is displayed in a digestible tabular format allowing the user to descend through hierarchical levels such as from locations to sites, samples, specimens, and measurements. At each stage, the result set can be saved and, if supported by the data, can be visualized by plotting global location maps, equal area plots, or typical Zijderveld, hysteresis, and various magnetization and remanence diagrams. User contributions to the MagIC database are critical to achieving a useful research tool. We have developed a standard data and metadata template (Version 2.1) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate population of these templates within Microsoft Excel. These tools allow for the import/export of text files and provide advanced functionality to manage and edit the data, and to perform various internal checks to maintain data integrity and prepare for uploading. The MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm executes the upload and takes only a few minutes to process several thousand data records. The standardized MagIC template files are stored in the digital archives of EarthRef.org where they

  9. Searching the UVSP database and a list of experiments showing mass motions

    NASA Technical Reports Server (NTRS)

    Thompson, William

    1986-01-01

    Since the Solar Maximum Mission (SMM) satellite was launched, a large database has been built up of experiments using the Ultraviolet Spectrometer and Polarimeter (UVSP) instrument. Access to this database can be gained through the SMM Vax 750 computer at Goddard Space Flight Center. One useful way to do this is with a program called USEARCH. This program allows one to make a listing of different types of UVSP experiments. It is evident that this program is useful to those who would wish to make use of UVSP data, but who don't know what data is available. Therefore it was decided to include a short description of how to make use of the USEARCH program. Also described, but not included, is a listing of all UVSP experiments showing mass motions in prominences and filaments. This list was made with the aid of the USEARCH program.

  10. RINGS: a new search/match database for polycrystalline electron diffraction

    NASA Astrophysics Data System (ADS)

    Denley, David; Hart, Haskell

    2003-03-01

    RINGS is a relational database built from NIST Crystal Data for the identification of polycrystalline solids by selected area electron diffraction (SAED) and elemental analysis using Microsoft® Access 97(subsequently converted to Access 2000). Experimental d-spacings are matched against values calculated from reduced unit cells, thereby fully and rigorously incorporating the effects of double diffraction. A total of 79,136 inorganic phases are included with original Crystal Data reference codes, allowing access to all information in NIST Crystal Data. Specific examples illustrate the advantages over previous approaches to the problem. This database will be most useful to researchers in mineralogy, metallurgy, materials science, forensics, and analytical chemisty who seek to identify well-characterized phases with known unit cells.

  11. Relemed: sentence-level search engine with relevance score for the MEDLINE database of biomedical articles

    PubMed Central

    Siadaty, Mir S; Shu, Jianfen; Knaus, William A

    2007-01-01

    Background Receiving extraneous articles in response to a query submitted to MEDLINE/PubMed is common. When submitting a multi-word query (which is the majority of queries submitted), the presence of all query words within each article may be a necessary condition for retrieving relevant articles, but not sufficient. Ideally a relationship between the query words in the article is also required. We propose that if two words occur within an article, the probability that a relation between them is explained is higher when the words occur within adjacent sentences versus remote sentences. Therefore, sentence-level concurrence can be used as a surrogate for existence of the relationship between the words. In order to avoid the irrelevant articles, one solution would be to increase the search specificity. Another solution is to estimate a relevance score to sort the retrieved articles. However among the >30 retrieval services available for MEDLINE, only a few estimate a relevance score, and none detects and incorporates the relation between the query words as part of the relevance score. Results We have developed "Relemed", a search engine for MEDLINE. Relemed increases specificity and precision of retrieval by searching for query words within sentences rather than the whole article. It uses sentence-level concurrence as a statistical surrogate for the existence of relationship between the words. It also estimates a relevance score and sorts the results on this basis, thus shifting irrelevant articles lower down the list. In two case studies, we demonstrate that the most relevant articles appear at the top of the Relemed results, while this is not necessarily the case with a PubMed search. We have also shown that a Relemed search includes not only all the articles retrieved by PubMed, but potentially additional relevant articles, due to the extended 'automatic term mapping' and text-word searching features implemented in Relemed. Conclusion By using sentence

  12. Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra

    NASA Astrophysics Data System (ADS)

    Wang, Jianqi; Zhang, Yajie; Yu, Yonghao

    2015-07-01

    A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.

  13. Image Content Engine (ICE): A System for Fast Image Database Searches

    SciTech Connect

    Brase, J M; Paglieroni, D W; Weinert, G F; Grant, C W; Lopez, A S; Nikolaev, S

    2005-03-22

    The Image Content Engine (ICE) is being developed to provide cueing assistance to human image analysts faced with increasingly large and intractable amounts of image data. The ICE architecture includes user configurable feature extraction pipelines which produce intermediate feature vector and match surface files which can then be accessed by interactive relational queries. Application of the feature extraction algorithms to large collections of images may be extremely time consuming and is launched as a batch job on a Linux cluster. The query interface accesses only the intermediate files and returns candidate hits nearly instantaneously. Queries may be posed for individual objects or collections. The query interface prompts the user for feedback, and applies relevance feedback algorithms to revise the feature vector weighting and focus on relevant search results. Examples of feature extraction and both model-based and search-by-example queries are presented.

  14. Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra.

    PubMed

    Wang, Jianqi; Zhang, Yajie; Yu, Yonghao

    2015-07-01

    A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.

  15. Exchange, interpretation, and database-search of ion mobility spectra supported by data format JCAMP-DX

    NASA Technical Reports Server (NTRS)

    Baumback, J. I.; Davies, A. N.; Vonirmer, A.; Lampen, P. H.

    1995-01-01

    To assist peak assignment in ion mobility spectrometry it is important to have quality reference data. The reference collection should be stored in a database system which is capable of being searched using spectral or substance information. We propose to build such a database customized for ion mobility spectra. To start off with it is important to quickly reach a critical mass of data in the collection. We wish to obtain as many spectra combined with their IMS parameters as possible. Spectra suppliers will be rewarded for their participation with access to the database. To make the data exchange between users and system administration possible, it is important to define a file format specially made for the requirements of ion mobility spectra. The format should be computer readable and flexible enough for extensive comments to be included. In this document we propose a data exchange format, and we would like you to give comments on it. For the international data exchange it is important, to have a standard data exchange format. We propose to base the definition of this format on the JCAMP-DX protocol, which was developed for the exchange of infrared spectra. This standard made by the Joint Committee on Atomic and Molecular Physical Data is of a flexible design. The aim of this paper is to adopt JCAMP-DX to the special requirements of ion mobility spectra.

  16. A Pseudo MS3 Approach for Identification of Disulfide-Bonded Proteins: Uncommon Product Ions and Database Search

    NASA Astrophysics Data System (ADS)

    Chen, Jianzhong; Shiyanov, Pavel; Schlager, John J.; Green, Kari B.

    2012-02-01

    It has previously been reported that disulfide and backbone bonds of native intact proteins can be concurrently cleaved using electrospray ionization (ESI) and collision-induced dissociation (CID) tandem mass spectrometry (MS/MS). However, the cleavages of disulfide bonds result in different cysteine modifications in product ions, making it difficult to identify the disulfide-bonded proteins via database search. To solve this identification problem, we have developed a pseudo MS3 approach by combining nozzle-skimmer dissociation (NSD) and CID on a quadrupole time-of-flight (Q-TOF) mass spectrometer using chicken lysozyme as a model. Although many of the product ions were similar to those typically seen in MS/MS spectra of enzymatically derived peptides, additional uncommon product ions were detected including ci-1 ions (the ith residue being aspartic acid, arginine, lysine and dehydroalanine) as well as those from a scrambled sequence. The formation of these uncommon types of product ions, likely caused by the lack of mobile protons, were proposed to involve bond rearrangements via a six-membered ring transition state and/or salt bridge(s). A search of 20 pseudo MS3 spectra against the Gallus gallus (chicken) database using Batch-Tag, a program originally designed for bottom up MS/MS analysis, identified chicken lysozyme as the only hit with the expectation values less than 0.02 for 12 of the spectra. The pseudo MS3 approach may help to identify disulfide-bonded proteins and determine the associated post-translational modifications (PTMs); the confidence in the identification may be improved by incorporating the fragmentation characteristics into currently available search programs.

  17. On-line biomedical databases-the best source for quick search of the scientific information in the biomedicine.

    PubMed

    Masic, Izet; Milinovic, Katarina

    2012-06-01

    Most of medical journals now has it's electronic version, available over public networks. Although there are parallel printed and electronic versions, and one other form need not to be simultaneously published. Electronic version of a journal can be published a few weeks before the printed form and must not has identical content. Electronic form of a journals may have an extension that does not contain a printed form, such as animation, 3D display, etc., or may have available fulltext, mostly in PDF or XML format, or just the contents or a summary. Access to a full text is usually not free and can be achieved only if the institution (library or host) enters into an agreement on access. Many medical journals, however, provide free access for some articles, or after a certain time (after 6 months or a year) to complete content. The search for such journals provide the network archive as High Wire Press, Free Medical Journals.com. It is necessary to allocate PubMed and PubMed Central, the first public digital archives unlimited collect journals of available medical literature, which operates in the system of the National Library of Medicine in Bethesda (USA). There are so called on- line medical journals published only in electronic form. It could be searched over on-line databases. In this paper authors shortly described about 30 data bases and short instructions how to make access and search the published papers in indexed medical journals.

  18. Searching for stereoisomerism in crystallographic databases: algorithm, analysis and chiral curiosities.

    PubMed

    Grothe, E; Meekes, H; de Gelder, R

    2017-06-01

    The automated identification of chiral centres in molecular residues is a non-trivial task. Current tools that allow the user to analyze crystallographic data entries do not identify chiral centres in some of the more complex ring structures, or lack the possibility to determine and compare the chirality of multiple structures. This article presents an approach to identify asymmetric C atoms, which is based on the atomic walk count algorithm presented by Rücker & Rücker [(1993), J. Chem. Inf. Comput. Sci. 33, 683-695]. The algorithm, which we implemented in a computer program named ChiChi, is able to compare isomeric residues based on the chiral centres that were identified. This allows for discrimination between enantiomers, diastereomers and constitutional isomers that are present in crystallographic databases. ChiChi was used to process 254 354 organic entries from the Cambridge Structural Database (CSD). A thorough analysis of stereoisomerism in the CSD is presented accompanied by a collection of chiral curiosities that illustrate the strength and versatility of this approach.

  19. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1996-10-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  20. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1995-09-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  1. Chemical and biological warfare: General studies. (Latest citations from the NTIS Bibliographic database). Published Search

    SciTech Connect

    Not Available

    1993-11-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  2. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1997-11-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  3. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). NewSearch

    SciTech Connect

    Not Available

    1994-10-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  4. Genetic Networks of Complex Disorders: from a Novel Search Engine for PubMed Article Database.

    PubMed

    Jung, Jae-Yoon; Wall, Dennis Paul

    2013-01-01

    Finding genetic risk factors of complex disorders may involve reviewing hundreds of genes or thousands of research articles iteratively, but few tools have been available to facilitate this procedure. In this work, we built a novel publication search engine that can identify target-disorder specific, genetics-oriented research articles and extract the genes with significant results. Preliminary test results showed that the output of this engine has better coverage in terms of genes or publications, than other existing applications. We consider it as an essential tool for understanding genetic networks of complex disorders.

  5. The Quantum Monte Carlo Database: towards high-accuracy and high-throughput calculation of material properties

    NASA Astrophysics Data System (ADS)

    Schiller, Joshua; Plante, Raymond; Wagner, Lucas; Ertekin, Elif

    Quantum Monte Carlo (QMC) techniques comprise a class of promising methods that offer a path towards higher accuracy for materials property prediction. However, their application in bulk materials has historically been limited to one-at-a-time evaluation of a given material. While these results often provide benchmark-level accuracy for quantities of interest, they do not allow for high-throughput analysis of the data since each calculation is done slightly differently. We present a combined data format and automatic generation platform based on the QWalk code for QMC data: QMCDB. This platform collects QMC results and provenance information automatically and stores the information in a database. We will report on the construction of this database and what lessons can be learned about using QMC for high-throughput applications.

  6. Searching for new physics at the frontiers with lattice quantum chromodynamics.

    PubMed

    Van de Water, Ruth S

    2012-07-01

    Numerical lattice-quantum chromodynamics (QCD) simulations, when combined with experimental measurements, allow the determination of fundamental parameters of the particle-physics Standard Model and enable searches for physics beyond-the-Standard Model. We present the current status of lattice-QCD weak matrix element calculations needed to obtain the elements and phase of the Cabibbo-Kobayashi-Maskawa (CKM) matrix and to test the Standard Model in the quark-flavor sector. We then discuss evidence that may hint at the presence of new physics beyond the Standard Model CKM framework. Finally, we discuss two opportunities where we expect lattice QCD to play a pivotal role in searching for, and possibly discovery of, new physics at upcoming high-intensity experiments: rare decays and the muon anomalous magnetic moment. The next several years may witness the discovery of new elementary particles at the Large Hadron Collider (LHC). The interplay between lattice QCD, high-energy experiments at the LHC, and high-intensity experiments will be needed to determine the underlying structure of whatever physics beyond-the-Standard Model is realized in nature.

  7. Heart research advances using database search engines, Human Protein Atlas and the Sydney Heart Bank.

    PubMed

    Li, Amy; Estigoy, Colleen; Raftery, Mark; Cameron, Darryl; Odeberg, Jacob; Pontén, Fredrik; Lal, Sean; Dos Remedios, Cristobal G

    2013-10-01

    This Methodological Review is intended as a guide for research students who may have just discovered a human "novel" cardiac protein, but it may also help hard-pressed reviewers of journal submissions on a "novel" protein reported in an animal model of human heart failure. Whether you are an expert or not, you may know little or nothing about this particular protein of interest. In this review we provide a strategic guide on how to proceed. We ask: How do you discover what has been published (even in an abstract or research report) about this protein? Everyone knows how to undertake literature searches using PubMed and Medline but these are usually encyclopaedic, often producing long lists of papers, most of which are either irrelevant or only vaguely relevant to your query. Relatively few will be aware of more advanced search engines such as Google Scholar and even fewer will know about Quertle. Next, we provide a strategy for discovering if your "novel" protein is expressed in the normal, healthy human heart, and if it is, we show you how to investigate its subcellular location. This can usually be achieved by visiting the website "Human Protein Atlas" without doing a single experiment. Finally, we provide a pathway to discovering if your protein of interest changes its expression level with heart failure/disease or with ageing.

  8. Searching the databases: a quick look at Amazon and two other online catalogues.

    PubMed

    Potts, Hilary

    2003-01-01

    The Amazon Online Catalogue was compared with the Library of Congress Catalogue and the British Library Catalogue, both also available online, by searching on both neutral (Gay, Lesbian, Homosexual) and pejorative (Perversion, Sex Crime) subject terms, and also by searches using Boolean logic in an attempt to identify Lesbian Fiction items and religion-based anti-gay material. Amazon was much more likely to be the first port of call for non-academic enquiries. Although excluding much material necessary for academic research, it carried more information about the individual books and less historical homophobic baggage in its terminology than the great national catalogues. Its back catalogue of second-hand books outnumbered those in print. Current attitudes may partially be gauged by the relative numbers of titles published under each heading--e.g., there may be an inverse relationship between concern about child sex abuse and homophobia, more noticeable in U.S. because of the activities of the religious right.

  9. A more straightforward derivation of the LR for a database search.

    PubMed

    Berger, Charles E H; Vergeer, Peter; Buckleton, John S

    2015-01-01

    Matching DNA profiles of an accused person and a crime scene trace are one of the most common forms of forensic evidence. A number of years ago the so-called 'DNA controversy' was concerned with how to quantify the value of such evidence. Given its importance, the lack of understanding of such a basic issue was quite surprising and concerning. Deriving the equation for the likelihood ratio of a DNA database match in a much more direct and simple way is the topic of this paper. As it is much easier to follow it is hoped that this derivation will contribute to the understanding. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  10. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

    PubMed

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

  11. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs

    PubMed Central

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers. PMID:26339591

  12. ANDY: A general, fault-tolerant tool for database searching oncomputer clusters

    SciTech Connect

    Smith, Andrew; Chandonia, John-Marc; Brenner, Steven E.

    2005-12-21

    Summary: ANDY (seArch coordination aND analYsis) is a set ofPerl programs and modules for distributing large biological databasesearches, and in general any sequence of commands, across the nodes of aLinux computer cluster. ANDY is compatible with several commonly usedDistributed Resource Management (DRM) systems, and it can be easilyextended to new DRMs. A distinctive feature of ANDY is the choice ofeither dedicated or fair-use operation: ANDY is almost as efficient assingle-purpose tools that require a dedicated cluster, but it runs on ageneral-purpose cluster along with any other jobs scheduled by a DRM.Other features include communication through named pipes for performance,flexible customizable routines for error-checking and summarizingresults, and multiple fault-tolerance mechanisms. Availability: ANDY isfreely available and may be obtained fromhttp://compbio.berkeley.edu/proj/andy; this site also containssupplemental data and figures and amore detailed overview of thesoftware.

  13. An ultra-tolerant database search reveals that a myriad of modified peptides contributes to unassigned spectra in shotgun proteomics

    PubMed Central

    Chick, Joel M.; Kolippakkam, Deepak; Nusinow, David P.; Zhai, Bo; Rad, Ramin; Huttlin, Edward L.; Gygi, Steven P.

    2015-01-01

    Fewer than half of all tandem mass spectrometry (MS/MS) spectra acquired in shotgun proteomics experiments are typically matched to a peptide with high confidence. Here we determine the identity of unassigned peptides using an ultra-tolerant Sequest database search that allows peptide matching even with modifications of unknown masses up to ±500 Da. In a proteome-wide dataset on HEK293 cells (9,513 proteins and 396,736 peptides), this approach matched an additional 184,000 modified peptides, which were linked to biological and chemical modifications representing 523 distinct mass bins, including phosphorylation, glycosylation, and methylation. We localized all unknown modification masses to specific regions within a peptide. Known modifications were assigned to the correct amino acids with frequencies often >90%. We conclude that at least one third of unassigned spectra arise from peptides with substoichiometric modifications. PMID:26076430

  14. Pivotal Role of Computers and Software in Mass Spectrometry - SEQUEST and 20 Years of Tandem MS Database Searching

    NASA Astrophysics Data System (ADS)

    Yates, John R.

    2015-11-01

    Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures.

  15. Pivotal role of computers and software in mass spectrometry - SEQUEST and 20 years of tandem MS database searching.

    PubMed

    Yates, John R

    2015-11-01

    Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures. Graphical Abstract ᅟ.

  16. The wildcat toolbox: a set of perl script utilities for use in peptide mass spectral database searching and proteomics experiments.

    PubMed

    Haynes, Paul A; Miller, Susan; Radabaugh, Tim; Galligan, Michael; Breci, Linda; Rohrbough, James; Hickman, Fatimah; Merchant, Nirav

    2006-04-01

    We describe in this communication a set of functional perl script utilities for use in peptide mass spectral database searching and proteomics experiments, known as the Wildcat Toolbox. These are all freely available for download from our laboratory Web site (http://proteomics.arizona.edu/toolbox.html) as a combined zip file, and can also be accessed via the Proteome Commons Web site (www.proteomecommons.org) in the tools section. We make them available to other potential users in the spirit of open source software development; we do not have the resources to provide any significant technical support for them, but we hope users will share both bugs and improvements with the community at large.

  17. The Wildcat Toolbox: A Set of Perl Script Utilities for Use in Peptide Mass Spectral Database Searching and Proteomics Experiments

    PubMed Central

    Haynes, Paul A.; Miller, Susan; Radabaugh, Tim; Galligan, Michael; Breci, Linda; Rohrbough, James; Hickman, Fatimah; Merchant, Nirav

    2006-01-01

    We describe in this communication a set of functional perl script utilities for use in peptide mass spectral database searching and proteomics experiments, known as the Wildcat Toolbox. These are all freely available for download from our laboratory Web site (http://proteomics.arizona.edu/toolbox.html) as a combined zip file, and can also be accessed via the Proteome Commons Web site (www.proteomecommons.org) in the tools section. We make them available to other potential users in the spirit of open source software development; we do not have the resources to provide any significant technical support for them, but we hope users will share both bugs and improvements with the community at large. PMID:16741236

  18. FTP-Server for exchange, interpretation, and database-search of ion mobility spectra, literature, preprints and software

    NASA Technical Reports Server (NTRS)

    Baumbach, J. I.; Vonirmer, A.

    1995-01-01

    To assist current discussion in the field of ion mobility spectrometry, at the Institut fur Spectrochemie und angewandte Spektroskopie, Dortmund, start with 4th of December, 1994 work of an FTP-Server, available for all research groups at univerisities, institutes and research worker in industry. We support the exchange, interpretation, and database-search of ion mobility spectra through data format JCAMP-DS (Joint Committee on Atomic and Molecular Physical Data) as well as literature retrieval, pre-print, notice, and discussion board. We describe in general lines the entrance conditions, local addresses, and main code words. For further details, a monthly news report will be prepared for all common users. Internet email address for subscribing is included in document.

  19. Does PSADT after Radical Prostatectomy Correlate with Overall Survival? — A Report from the SEARCH Database Group

    PubMed Central

    Teeter, Anna E.; Presti, Joseph C.; Aronson, William J.; Terris, Martha K.; Kane, Christopher J.; Amling, Christopher L.; Freedland, Stephen J.

    2010-01-01

    Objective Prior studies largely performed at tertiary care centers with relatively young, racially homogenous cohorts found a short PSA doubling time (PSADT) following recurrence after radical prostatectomy (RP) portends a poor prognosis. We examined the correlation between PSADT and overall survival (OS) and among men in the SEARCH database, an older, racially diverse cohort treated with RP at multiple Veterans Affairs medical centers. Methods We performed a Cox proportional hazards analysis to examine the correlation between post-recurrence PSADT and time from recurrence to OS and PCSM among 345 men in the SEARCH database who underwent RP between 1988 and 2008. We examined PSADT as a categorical variable based on the clinically significant cut-points of <3, 3-8.9, 9–14.9, and ≥15 months. Results PSADT <3 months (HR 5.48, p=0.002) was associated with poorer OS versus PSADT ≥15 months. There was a trend towards worse OS among men with a PSADT of 3–8.9 months (HR=1.70, p=0.07). PSADT <3 months (p<0.001) and 3–8.9 months (p=0.004) were associated with increased risk of PCSM. Conclusions In an older, racially diverse cohort, recurrence with a PSADT <9 months was associated with worse all-cause mortality. This study validates prior findings that PSADT is a useful tool for identifying men who are at increased risk of all-cause mortality early in the course of their disease. PMID:21145094

  20. The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching

    PubMed Central

    Howe, Douglas G.; Bradford, Yvonne M.; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte

    2017-01-01

    The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, ‘Fish’ records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search. PMID:27899582

  1. The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching.

    PubMed

    Howe, Douglas G; Bradford, Yvonne M; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte

    2017-01-04

    The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, 'Fish' records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search.

  2. Quality Control of Biomedicinal Allergen Products - Highly Complex Isoallergen Composition Challenges Standard MS Database Search and Requires Manual Data Analyses.

    PubMed

    Spiric, Jelena; Engin, Anna M; Karas, Michael; Reuter, Andreas

    2015-01-01

    Allergy against birch pollen is among the most common causes of spring pollinosis in Europe and is diagnosed and treated using extracts from natural sources. Quality control is crucial for safe and effective diagnosis and treatment. However, current methods are very difficult to standardize and do not address individual allergen or isoallergen composition. MS provides information regarding selected proteins or the entire proteome and could overcome the aforementioned limitations. We studied the proteome of birch pollen, focusing on allergens and isoallergens, to clarify which of the 93 published sequence variants of the major allergen, Bet v 1, are expressed as proteins within one source material in parallel. The unexpectedly complex Bet v 1 isoallergen composition required manual data interpretation and a specific design of databases, as current database search engines fail to unambiguously assign spectra to highly homologous, partially identical proteins. We identified 47 non-allergenic proteins and all 5 known birch pollen allergens, and unambiguously proved the existence of 18 Bet v 1 isoallergens and variants by manual data analysis. This highly complex isoallergen composition raises questions whether isoallergens can be ignored or must be included for the quality control of allergen products, and which data analysis strategies are to be applied.

  3. Utility of rapid database searching for quality assurance: 'detective work' in uncovering radiology coding and billing errors

    NASA Astrophysics Data System (ADS)

    Horii, Steven C.; Kim, Woojin; Boonn, William; Iyoob, Christopher; Maston, Keith; Coleman, Beverly G.

    2011-03-01

    When the first quarter of 2010 Department of Radiology statistics were provided to the Section Chiefs, the authors (SH, BC) were alarmed to discover that Ultrasound showed a decrease of 2.5 percent in billed examinations. This seemed to be in direct contradistinction to the experience of the ultrasound faculty members and sonographers. Their experience was that they were far busier than during the same quarter of 2009. The one exception that all acknowledged was the month of February, 2010 when several major winter storms resulted in a much decreased Hospital admission and Emergency Department visit rate. Since these statistics in part help establish priorities for capital budget items, professional and technical staffing levels, and levels of incentive salary, they are taken very seriously. The availability of a desktop, Web-based RIS database search tool developed by two of the authors (WK, WB) and built-in database functions of the ultrasound miniPACS, made it possible for us very rapidly to develop and test hypotheses for why the number of billable examinations was declining in the face of what experience told the authors was an increasing number of examinations being performed. Within a short time, we identified the major cause as errors on the part of the company retained to verify billable Current Procedural Terminology (CPT) codes against ultrasound reports. This information is being used going forward to recover unbilled examinations and take measures to reduce or eliminate the types of coding errors that resulted in the problem.

  4. Similarity landscapes: An improved method for scientific visualization of information from protein and DNA database searches

    SciTech Connect

    Dogget, N.; Myers, G.; Wills, C.J.

    1998-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The authors have used computer simulations and examination of a variety of databases to answer questions about a wide range of evolutionary questions. The authors have found that there is a clear distinction in the evolution of HIV-1 and HIV-2, with the former and more virulent virus evolving more rapidly at a functional level. The authors have discovered highly non-random patterns in the evolution of HIV-1 that can be attributed to a variety of selective pressures. In the course of examination of microsatellite DNA (short repeat regions) in microorganisms, the authors have found clear differences between prokaryotes and eukaryotes in their distribution, differences that can be tied to different selective pressures. They have developed a new method (topiary pruning) for enhancing the phylogenetic information contained in DNA sequences. Most recently, the authors have discovered effects in complex rainforest ecosystems that indicate strong frequency-dependent interactions between host species and their parasites, leading to the maintenance of ecosystem variability.

  5. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    PubMed

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

  6. On the hypothesis that quantum mechanism manifests classical mechanics: Numerical approach to the correspondence in search of quantum chaos

    SciTech Connect

    Lee, Sang-Bong

    1993-09-01

    Quantum manifestation of classical chaos has been one of the extensively studied subjects for more than a decade. Yet clear understanding of its nature still remains to be an open question partly due to the lack of a canonical definition of quantum chaos. The classical definition seems to be unsuitable in quantum mechanics partly because of the Heisenberg quantum uncertainty. In this regard, quantum chaos is somewhat misleading and needs to be clarified at the very fundamental level of physics. Since it is well known that quantum mechanics is more fundamental than classical mechanics, the quantum description of classically chaotic nature should be attainable in the limit of large quantum numbers. The focus of my research, therefore, lies on the correspondence principle for classically chaotic systems. The chaotic damped driven pendulum is mainly studied numerically using the split operator method that solves the time-dependent Schroedinger equation. For classically dissipative chaotic systems in which (multi)fractal strange attractors often emerge, several quantum dissipative mechanisms are also considered. For instance, Hoover`s and Kubo-Fox-Keizer`s approaches are studied with some computational analyses. But the notion of complex energy with non-Hermiticity is extensively applied. Moreover, the Wigner and Husimi distribution functions are examined with an equivalent classical distribution in phase-space, and dynamical properties of the wave packet in configuration and momentum spaces are also explored. The results indicate that quantum dynamics embraces classical dynamics although the classicalquantum correspondence fails to be observed in the classically chaotic regime. Even in the semi-classical limits, classically chaotic phenomena would eventually be suppressed by the quantum uncertainty.

  7. A Cautionary Tale on the Inclusion of Variable Posttranslational Modifications in Database-Dependent Searches of Mass Spectrometry Data.

    PubMed

    Svozil, J; Baerenfaller, K

    2017-01-01

    Mass spectrometry-based proteomics allows in principle the identification of unknown target proteins of posttranslational modifications and the sites of attachment. Including a variety of posttranslational modifications in database-dependent searches of high-throughput mass spectrometry data holds the promise to gain spectrum assignments to modified peptides, thereby increasing the number of assigned spectra, and to identify potentially interesting modification events. However, these potential benefits come for the price of an increased search space, which can lead to reduced scores, increased score thresholds, and erroneous peptide spectrum matches. We have assessed here the advantages and disadvantages of including the variable posttranslational modifications methionine oxidation, protein N-terminal acetylation, cysteine carbamidomethylation, transformation of N-terminal glutamine to pyroglutamic acid (Gln→pyro-Glu), and deamidation of asparagine and glutamine. Based on calculations of local false discovery rates and comparisons to known features of the respective modifications, we recommend for searches of samples that were not enriched for specific posttranslational modifications to only include methionine oxidation, protein N-terminal acetylation, and peptide N-terminal Gln→pyro-Glu as variable modifications. The principle of the validation strategy adopted here can also be applied for assessing the inclusion of posttranslational modifications for differently prepared samples, or for additional modifications. In addition, we have reassessed the special properties of the ubiquitin footprint, which is the remainder of ubiquitin moieties attached to lysines after tryptic digest. We show here that the ubiquitin footprint often breaks off as neutral loss and that it can be distinguished from dicarbamidomethylation events. © 2017 Elsevier Inc. All rights reserved.

  8. Searching for an optimal control in the presence of saddles on the quantum-mechanical observable landscape

    NASA Astrophysics Data System (ADS)

    Riviello, Gregory; Wu, Re-Bing; Sun, Qiuyang; Rabitz, Herschel

    2017-06-01

    The broad success of theoretical and experimental quantum optimal control is intimately connected to the topology of the underlying control landscape. For several common quantum control goals, including the maximization of an observable expectation value, the landscape has been shown to lack local optima if three assumptions are satisfied: (i) the quantum system is controllable, (ii) the Jacobian of the map from the control field to the evolution operator is full rank, and (iii) the control field is not constrained. In the case of the observable objective, this favorable analysis shows that the associated landscape also contains saddles, i.e., critical points that are not local suboptimal extrema. In this paper, we investigate whether the presence of these saddles affects the trajectories of gradient-based searches for an optimal control. We show through simulations that both the detailed topology of the control landscape and the parameters of the system Hamiltonian influence whether the searches are attracted to a saddle. For some circumstances with a special initial state and target observable, optimizations may approach a saddle very closely, reducing the efficiency of the gradient algorithm. Encounters with such attractive saddles are found to be quite rare. Neither the presence of a large number of saddles on the control landscape nor a large number of system states increases the likelihood that a search will closely approach a saddle. Even for applications that encounter a saddle, well-designed gradient searches with carefully chosen algorithmic parameters will readily locate optimal controls.

  9. Uploading, Searching and Visualizing of Paleomagnetic and Rock Magnetic Data in the Online MagIC Database

    NASA Astrophysics Data System (ADS)

    Minnett, R.; Koppers, A.; Tauxe, L.; Constable, C.; Donadini, F.

    2007-12-01

    The Magnetics Information Consortium (MagIC) is commissioned to implement and maintain an online portal to a relational database populated by both rock and paleomagnetic data. The goal of MagIC is to archive all available measurements and derived properties from paleomagnetic studies of directions and intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and will soon implement two search nodes, one for paleomagnetism and one for rock magnetism. Currently the PMAG node is operational. Both nodes provide query building based on location, reference, methods applied, material type and geological age, as well as a visual map interface to browse and select locations. Users can also browse the database by data type or by data compilation to view all contributions associated with well known earlier collections like PINT, GMPDB or PSVRL. The query result set is displayed in a digestible tabular format allowing the user to descend from locations to sites, samples, specimens and measurements. At each stage, the result set can be saved and, where appropriate, can be visualized by plotting global location maps, equal area, XY, age, and depth plots, or typical Zijderveld, hysteresis, magnetization and remanence diagrams. User contributions to the MagIC database are critical to achieving a useful research tool. We have developed a standard data and metadata template (version 2.3) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate population of these templates within Microsoft Excel. These tools allow for the import/export of text files and provide advanced functionality to manage and edit the data, and to perform various internal checks to maintain data integrity and prepare for uploading. The MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm executes the upload

  10. Asthma research performance in Asia-Pacific: a bibliometric analysis by searching PubMed database.

    PubMed

    Klaewsongkram, Jettanong; Reantragoon, Rangsima

    2009-12-01

    Countries in the Asia-Pacific region have experienced an increase in the prevalence of asthma, and they have been actively involved in asthma research recently. This study aimed to analyze asthma research from Asia-Pacific in the last decade by bibliometric method. Asthma articles from Asia-Pacific countries published between 1998 and 2007 were retrieved from PubMed by searching MeSH for "asthma." Most of published asthma articles in Asia-Pacific are from affluent countries in northeast Asia and Oceania. Australia and Japan have been the regional powerhouses since they contributed more than half of regional articles on asthma. Asthma publications from emerging economies in Asia such as South Korea, Taiwan, Hong Kong, and Singapore, have dramatically increased in the last decade in terms of quantity and quality aspects and were considerable sources of basic and translational research in the region. Mainland China and India have significantly increased their research capacity as well, but quality needs to be improved. Asthma publications from New Zealand and Australia, countries with the highest asthma prevalence rates in the world, yielded highest citation counts per articles and were published in journals with high impact factor. Asthma research parameters per million population correlate well with gross domestic product per capita. Almost half (41%) of total articles were produced from only 25 institutions in the region and almost half of them (47%) were published in 20 journals. Asthma research in Asia-Pacific were mainly conducted in countries in Oceania and Northeast Asia and research performance strongly correlated with the nation's wealth. Interesting asthma research projects in the region were recommended.

  11. Reprint of "pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data".

    PubMed

    Chi, Hao; He, Kun; Yang, Bing; Chen, Zhen; Sun, Rui-Xiang; Fan, Sheng-Bo; Zhang, Kun; Liu, Chao; Yuan, Zuo-Fei; Wang, Quan-Hui; Liu, Si-Qi; Dong, Meng-Qiu; He, Si-Min

    2015-11-03

    Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data.

    PubMed

    Chi, Hao; He, Kun; Yang, Bing; Chen, Zhen; Sun, Rui-Xiang; Fan, Sheng-Bo; Zhang, Kun; Liu, Chao; Yuan, Zuo-Fei; Wang, Quan-Hui; Liu, Si-Qi; Dong, Meng-Qiu; He, Si-Min

    2015-07-01

    Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. Automatic sorting of toxicological information into the IUCLID (International Uniform Chemical Information Database) endpoint-categories making use of the semantic search engine Go3R.

    PubMed

    Sauer, Ursula G; Wächter, Thomas; Hareng, Lars; Wareing, Britta; Langsch, Angelika; Zschunke, Matthias; Alvers, Michael R; Landsiedel, Robert

    2014-06-01

    The knowledge-based search engine Go3R, www.Go3R.org, has been developed to assist scientists from industry and regulatory authorities in collecting comprehensive toxicological information with a special focus on identifying available alternatives to animal testing. The semantic search paradigm of Go3R makes use of expert knowledge on 3Rs methods and regulatory toxicology, laid down in the ontology, a network of concepts, terms, and synonyms, to recognize the contents of documents. Search results are automatically sorted into a dynamic table of contents presented alongside the list of documents retrieved. This table of contents allows the user to quickly filter the set of documents by topics of interest. Documents containing hazard information are automatically assigned to a user interface following the endpoint-specific IUCLID5 categorization scheme required, e.g. for REACH registration dossiers. For this purpose, complex endpoint-specific search queries were compiled and integrated into the search engine (based upon a gold standard of 310 references that had been assigned manually to the different endpoint categories). Go3R sorts 87% of the references concordantly into the respective IUCLID5 categories. Currently, Go3R searches in the 22 million documents available in the PubMed and TOXNET databases. However, it can be customized to search in other databases including in-house databanks. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Exploring Site-Specific N-Glycosylation Microheterogeneity of Haptoglobin using Glycopeptide CID Tandem Mass Spectra and Glycan Database Search

    PubMed Central

    Chandler, Kevin Brown; Pompach, Petr; Goldman, Radoslav

    2013-01-01

    Glycosylation is a common protein modification with a significant role in many vital cellular processes and human diseases, making the characterization of protein-attached glycan structures important for understanding cell biology and disease processes. Direct analysis of protein N-glycosylation by tandem mass spectrometry of glycopeptides promises site-specific elucidation of N-glycan microheterogeneity, something which detached N-glycan and de-glycosylated peptide analyses cannot provide. However, successful implementation of direct N-glycopeptide analysis by tandem mass spectrometry remains a challenge. In this work, we consider algorithmic techniques for the analysis of LC-MS/MS data acquired from glycopeptide-enriched fractions of enzymatic digests of purified proteins. We implement a computational strategy which takes advantage of the properties of CID fragmentation spectra of N-glycopeptides, matching the MS/MS spectra to peptide-glycan pairs from protein sequences and glycan structure databases. Significantly, we also propose a novel false-discovery-rate estimation technique to estimate and manage the number of false identifications. We use a human glycoprotein standard, haptoglobin, digested with trypsin and GluC, enriched for glycopeptides using HILIC chromatography, and analyzed by LC-MS/MS to demonstrate our algorithmic strategy and evaluate its performance. Our software, GlycoPeptideSearch (GPS), assigned glycopeptide identifications to 246 of the spectra at false-discovery-rate 5.58%, identifying 42 distinct haptoglobin peptide-glycan pairs at each of the four haptoglobin N-linked glycosylation sites. We further demonstrate the effectiveness of this approach by analyzing plasma-derived haptoglobin, identifying 136 N-linked glycopeptide spectra at false-discovery-rate 0.4%, representing 15 distinct glycopeptides on at least three of the four N-linked glycosylation sites. The software, GlycoPeptideSearch, is available for download from http

  15. Novel lead generation through hypothetical pharmacophore three-dimensional database searching: discovery of isoflavonoids as nonsteroidal inhibitors of rat 5 alpha-reductase.

    PubMed

    Chen, G S; Chang, C S; Kan, W M; Chang, C L; Wang, K C; Chern, J W

    2001-11-08

    A hypothetical pharmacophore of 5 alpha-reductase inhibitors was generated and served as a template in virtual screening. When the pharmacophore was used, eight isoflavone derivatives were characterized as novel potential nonsteroidal inhibitors of rat 5 alpha-reductase. This investigation has demonstrated a practical approach toward the development of lead compounds through a hypothetic pharmacophore via three-dimensional database searching.

  16. PhenoMeter: A Metabolome Database Search Tool Using Statistical Similarity Matching of Metabolic Phenotypes for High-Confidence Detection of Functional Links.

    PubMed

    Carroll, Adam J; Zhang, Peng; Whitehead, Lynne; Kaines, Sarah; Tcherkez, Guillaume; Badger, Murray R

    2015-01-01

    This article describes PhenoMeter (PM), a new type of metabolomics database search that accepts metabolite response patterns as queries and searches the MetaPhen database of reference patterns for responses that are statistically significantly similar or inverse for the purposes of detecting functional links. To identify a similarity measure that would detect functional links as reliably as possible, we compared the performance of four statistics in correctly top-matching metabolic phenotypes of Arabidopsis thaliana metabolism mutants affected in different steps of the photorespiration metabolic pathway to reference phenotypes of mutants affected in the same enzymes by independent mutations. The best performing statistic, the PM score, was a function of both Pearson correlation and Fisher's Exact Test of directional overlap. This statistic outperformed Pearson correlation, biweight midcorrelation and Fisher's Exact Test used alone. To demonstrate general applicability, we show that the PM reliably retrieved the most closely functionally linked response in the database when queried with responses to a wide variety of environmental and genetic perturbations. Attempts to match metabolic phenotypes between independent studies were met with varying success and possible reasons for this are discussed. Overall, our results suggest that integration of pattern-based search tools into metabolomics databases will aid functional annotation of newly recorded metabolic phenotypes analogously to the way sequence similarity search algorithms have aided the functional annotation of genes and proteins. PM is freely available at MetabolomeExpress (https://www.metabolome-express.org/phenometer.php).

  17. Conceptual changes arising from the use of a search interface developed for an elementary science curriculum database

    NASA Astrophysics Data System (ADS)

    Dwyer, William Michael

    1998-12-01

    The purpose of this study was to look for evidence of change in preservice elementary teachers, notions of science teaching after practice using a search interface for a database of elementary science curriculum materials. The Science Helper K--8 CD-ROM uses search criteria that include science content and process theme to provide appropriate science lessons for elementary educators. Training that took place when Science Helper was first disseminated revealed the possibility that notions about teaching science change with use of the resource. This study looked for evidence of conceptual change compatible with notions in recent reform materials, such as the National Science Education Standards. The study design consisted of a pretest-treatment-posttest model. The treatment included a brief training session in the use of Science Helper, followed by practical application, which consisted of finding appropriate lessons to form a science mini-unit. An analysis of covariance (ANCOVA), however, did not find significant differences between pretest and posttest scores for the treatment group. Study participants also wrote brief narratives about their experiences using Science Helper. A pattern analysis of the narratives found that most of the preservice teachers had positive experiences, saying the resource was easy to use and contained many interesting science activities. A closer examination of the comments revealed a subset of participants who expressed an understanding of the importance of criteria searches and the relatedness of the lessons produced. An ANCOVA of the treatment group controlling for pretest did not find significant differences between pretest and posttest scores for the group who expressed such understanding. Science Helper, with its affordances as a teacher resource, can be regarded as a "knowledge system" in a distributed environment. The interactions among people and material resources in a distributed environment results in a distributed

  18. Evidential significance of automotive paint trace evidence using a pattern recognition based infrared library search engine for the Paint Data Query Forensic Database.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Fasasi, Ayuba; Weakley, Andrew

    2016-10-01

    A prototype library search engine has been further developed to search the infrared spectral libraries of the paint data query database to identify the line and model of a vehicle from the clear coat, surfacer-primer, and e-coat layers of an intact paint chip. For this study, search prefilters were developed from 1181 automotive paint systems spanning 3 manufacturers: General Motors, Chrysler, and Ford. The best match between each unknown and the spectra in the hit list generated by the search prefilters was identified using a cross-correlation library search algorithm that performed both a forward and backward search. In the forward search, spectra were divided into intervals and further subdivided into windows (which corresponds to the time lag for the comparison) within those intervals. The top five hits identified in each search window were compiled; a histogram was computed that summarized the frequency of occurrence for each library sample, with the IR spectra most similar to the unknown flagged. The backward search computed the frequency and occurrence of each line and model without regard to the identity of the individual spectra. Only those lines and models with a frequency of occurrence greater than or equal to 20% were included in the final hit list. If there was agreement between the forward and backward search results, the specific line and model common to both hit lists was always the correct assignment. Samples assigned to the same line and model by both searches are always well represented in the library and correlate well on an individual basis to specific library samples. For these samples, one can have confidence in the accuracy of the match. This was not the case for the results obtained using commercial library search algorithms, as the hit quality index scores for the top twenty hits were always greater than 99%. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. RFRCDB-siRNA: improved design of siRNAs by random forest regression model coupled with database searching.

    PubMed

    Jiang, Peng; Wu, Haonan; Da, Yao; Sang, Fei; Wei, Jiawei; Sun, Xiao; Lu, Zuhong

    2007-09-01

    Although the observations concerning the factors which influence the siRNA efficacy give clues to the mechanism of RNAi, the quantitative prediction of the siRNA efficacy is still a challenge task. In this paper, we introduced a novel non-linear regression method: random forest regression (RFR), to quantitatively estimate siRNAs efficacy values. Compared with an alternative machine learning regression algorithm, support vector machine regression (SVR) and four other score-based algorithms [A. Reynolds, D. Leake, Q. Boese, S. Scaringe, W.S. Marshall, A. Khvorova, Rational siRNA design for RNA interference, Nat. Biotechnol. 22 (2004) 326-330; K. Ui-Tei, Y. Naito, F. Takahashi, T. Haraguchi, H. Ohki-Hamazaki, A. Juni, R. Ueda, K. Saigo, Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference, Nucleic Acids Res. 32 (2004) 936-948; A.C. Hsieh, R. Bo, J. Manola, F. Vazquez, O. Bare, A. Khvorova, S. Scaringe, W.R. Sellers, A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants of gene silencing for use in cell-based screens, Nucleic Acids Res. 32 (2004) 893-901; M. Amarzguioui, H. Prydz, An algorithm for selection of functional siRNA sequences, Biochem. Biophys. Res. Commun. 316 (2004) 1050-1058) our RFR model achieved the best performance of all. A web-server, RFRCDB-siRNA (http://www.bioinf.seu.edu.cn/siRNA/index.htm), has been developed. RFRCDB-siRNA consists of two modules: a siRNA-centric database and a RFR prediction system. RFRCDB-siRNA works as follows: (1) Instead of directly predicting the gene silencing activity of siRNAs, the service takes these siRNAs as queries to search against the siRNA-centric database. The matched sequences with the exceeding the user defined functionality value threshold are kept. (2) The mismatched sequences are then processed into the RFR prediction system for further analysis.

  20. Pattern Recognition-Assisted Infrared Library Searching of the Paint Data Query Database to Enhance Lead Information from Automotive Paint Trace Evidence.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Weakley, Andrew

    2017-03-01

    Multilayered automotive paint fragments, which are one of the most complex materials encountered in the forensic science laboratory, provide crucial links in criminal investigations and prosecutions. To determine the origin of these paint fragments, forensic automotive paint examiners have turned to the paint data query (PDQ) database, which allows the forensic examiner to compare the layer sequence and color, texture, and composition of the sample to paint systems of the original equipment manufacturer (OEM). However, modern automotive paints have a thin color coat and this layer on a microscopic fragment is often too thin to obtain accurate chemical and topcoat color information. A search engine has been developed for the infrared (IR) spectral libraries of the PDQ database in an effort to improve discrimination capability and permit quantification of discrimination power for OEM automotive paint comparisons. The similarity of IR spectra of the corresponding layers of various records for original finishes in the PDQ database often results in poor discrimination using commercial library search algorithms. A pattern recognition approach employing pre-filters and a cross-correlation library search algorithm that performs both a forward and backward search has been used to significantly improve the discrimination of IR spectra in the PDQ database and thus improve the accuracy of the search. This improvement permits inter-comparison of OEM automotive paint layer systems using the IR spectra alone. Such information can serve to quantify the discrimination power of the original automotive paint encountered in casework and further efforts to succinctly communicate trace evidence to the courts.

  1. Do all men with pathological Gleason score 8-10 prostate cancer have poor outcomes? Results from the SEARCH database.

    PubMed

    Fischer, Sean; Lin, Daniel; Simon, Ross M; Howard, Lauren E; Aronson, William J; Terris, Martha K; Kane, Christopher J; Amling, Christopher L; Cooperberg, Matt R; Freedland, Stephen J; Vidal, Adriana C

    2016-08-01

    To determine whether there are subsets of men with pathological high grade prostate cancer (Gleason score 8-10) with particularly high or low 2-year biochemical recurrence (BCR) risk after radical prostatectomy (RP) when stratified into groups based on combinations of pathological features, such as surgical margin status, extracapsular extension (ECE) and seminal vesicle invasion (SVI). We identified 459 men treated with RP with pathological Gleason score 8-10 prostate cancer in the SEARCH database. The men were stratified into five groups based on pathological characteristics: group 1, men with negative surgical margins (NSMs) and no ECE; group 2, men with positive surgical margin (PSMs) and no ECE; group 3, men with NSMs and ECE; group 4, men with PSMs and ECE; and group 5, men with SVI. Cox proportional hazards models and the log-rank test were used to compare BCR among the groups. At 2 years after RP, pathological group was significantly correlated with BCR (log-rank, P < 0.001) with patients in group 5 (+SVI) having the highest BCR risk (66%) and those in group 1 (NSMs and no ECE) having the lowest risk (14%). When we compared groups 2, 3, and 4, with each other, there was no significant difference in BCR among the groups (~50% 2-year BCR risk; log-rank P = 0.28). Results were similar when adjusting for prostate-specific antigen, age, pathological Gleason sum and clinical stage, or after excluding men who received adjuvant therapy. In patients with high grade (Gleason score 8-10) prostate cancer after RP, the presence of either PSMs, ECE or SVI was associated with an increased risk of early BCR, with a 2-year BCR risk of ≥50%. Conversely, men with organ-confined margin-negative disease had a very low risk of early BCR despite Gleason score 8-10 disease. © 2015 The Authors BJU International © 2015 BJU International Published by John Wiley & Sons Ltd.

  2. Use of DNA profiles for investigation using a simulated national DNA database: Part II. Statistical and ethical considerations on familial searching.

    PubMed

    Hicks, T; Taroni, F; Curran, J; Buckleton, J; Castella, V; Ribaux, O

    2010-10-01

    Familial searching consists of searching for a full profile left at a crime scene in a National DNA Database (NDNAD). In this paper we are interested in the circumstance where no full match is returned, but a partial match is found between a database member's profile and the crime stain. Because close relatives share more of their DNA than unrelated persons, this partial match may indicate that the crime stain was left by a close relative of the person with whom the partial match was found. This approach has successfully solved important crimes in the UK and the USA. In a previous paper, a model, which takes into account substructure and siblings, was used to simulate a NDNAD. In this paper, we have used this model to test the usefulness of familial searching and offer guidelines for pre-assessment of the cases based on the likelihood ratio. Siblings of "persons" present in the simulated Swiss NDNAD were created. These profiles (N=10,000) were used as traces and were then compared to the whole database (N=100,000). The statistical results obtained show that the technique has great potential confirming the findings of previous studies. However, effectiveness of the technique is only one part of the story. Familial searching has juridical and ethical aspects that should not be ignored. In Switzerland for example, there are no specific guidelines to the legality or otherwise of familial searching. This article both presents statistical results, and addresses criminological and civil liberties aspects to take into account risks and benefits of familial searching.

  3. Image Databases.

    ERIC Educational Resources Information Center

    Pettersson, Rune

    Different kinds of pictorial databases are described with respect to aims, user groups, search possibilities, storage, and distribution. Some specific examples are given for databases used for the following purposes: (1) labor markets for artists; (2) document management; (3) telling a story; (4) preservation (archives and museums); (5) research;…

  4. How to prepare a systematic review of economic evaluations for clinical practice guidelines: database selection and search strategy development (part 2/3).

    PubMed

    Thielen, F W; Van Mastrigt, Gapg; Burgers, L T; Bramer, W M; Majoie, Hjm; Evers, Smaa; Kleijnen, J

    2016-12-01

    This article is part of the series "How to prepare a systematic review of economic evaluations (EES) for informing evidence-based healthcare decisions", in which a five-step approach is proposed. Areas covered: This paper focuses on the selection of relevant databases and developing a search strategy for detecting EEs, as well as on how to perform the search and how to extract relevant data from retrieved records. Expert commentary: Thus far, little has been published on how to conduct systematic review EEs. Moreover, reliable sources of information, such as the Health Economic Evaluation Database, have ceased to publish updates. Researchers are thus left without authoritative guidance on how to conduct SR-EEs. Together with van Mastrigt et al. we seek to fill this gap.

  5. Matching unknown empirical formulas to chemical structure using LC/MS TOF accurate mass and database searching: example of unknown pesticides on tomato skins.

    PubMed

    Thurman, E Michael; Ferrer, Imma; Fernández-Alba, Amadeo Rodriguez

    2005-03-04

    Traditionally, the screening of unknown pesticides in food has been accomplished by GC/MS methods using conventional library searching routines. However, many of the new polar and thermally labile pesticides and their degradates are more readily and easily analyzed by LC/MS methods and no searchable libraries currently exist (with the exception of some user libraries, which are limited). Therefore, there is a need for LC/MS approaches to detect unknown non-target pesticides in food. This report develops an identification scheme using a combination of LC/MS time-of-flight (accurate mass) and LC/MS ion trap MS (MS/MS) with searching of empirical formulas generated through accurate mass and a ChemIndex database or Merck Index database. The approach is different than conventional library searching of fragment ions. The concept here consists of four parts. First is the initial detection of a possible unknown pesticide in actual market-place vegetable extracts (tomato skins) using accurate mass and generating empirical formulas. Second is searching either the Merck Index database on CD (10,000 compounds) or the ChemIndex (77,000 compounds) for possible structures. Third is MS/MS of the unknown pesticide in the tomato-skin extract followed by fragment ion identification using chemical drawing software and comparison with accurate-mass ion fragments. Fourth is the verification with authentic standards, if available. Three examples of unknown, non-target pesticides are shown using a tomato-skin extract from an actual market place sample. Limitations of the approach are discussed including the use of A + 2 isotope signatures, extended databases, lack of authentic standards, and natural product unknowns in food extracts.

  6. Cooperative Quantum-Behaved Particle Swarm Optimization with Dynamic Varying Search Areas and Lévy Flight Disturbance

    PubMed Central

    Li, Desheng

    2014-01-01

    This paper proposes a novel variant of cooperative quantum-behaved particle swarm optimization (CQPSO) algorithm with two mechanisms to reduce the search space and avoid the stagnation, called CQPSO-DVSA-LFD. One mechanism is called Dynamic Varying Search Area (DVSA), which takes charge of limiting the ranges of particles' activity into a reduced area. On the other hand, in order to escape the local optima, Lévy flights are used to generate the stochastic disturbance in the movement of particles. To test the performance of CQPSO-DVSA-LFD, numerical experiments are conducted to compare the proposed algorithm with different variants of PSO. According to the experimental results, the proposed method performs better than other variants of PSO on both benchmark test functions and the combinatorial optimization issue, that is, the job-shop scheduling problem. PMID:24851085

  7. RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures

    PubMed Central

    2010-01-01

    Background Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB) in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. Description RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics) is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA structures is provided. RNA FRABASE

  8. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System

    PubMed Central

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively. PMID:26568953

  9. Comparing the Precision of Information Retrieval of MeSH-Controlled Vocabulary Search Method and a Visual Method in the Medline Medical Database.

    PubMed

    Hariri, Nadjla; Ravandi, Somayyeh Nadi

    2014-01-01

    Medline is one of the most important databases in the biomedical field. One of the most important hosts for Medline is Elton B. Stephens CO. (EBSCO), which has presented different search methods that can be used based on the needs of the users. Visual search and MeSH-controlled search methods are among the most common methods. The goal of this research was to compare the precision of the retrieved sources in the EBSCO Medline base using MeSH-controlled and visual search methods. This research was a semi-empirical study. By holding training workshops, 70 students of higher education in different educational departments of Kashan University of Medical Sciences were taught MeSH-Controlled and visual search methods in 2012. Then, the precision of 300 searches made by these students was calculated based on Best Precision, Useful Precision, and Objective Precision formulas and analyzed in SPSS software using the independent sample T Test, and three precisions obtained with the three precision formulas were studied for the two search methods. The mean precision of the visual method was greater than that of the MeSH-Controlled search for all three types of precision, i.e. Best Precision, Useful Precision, and Objective Precision, and their mean precisions were significantly different (P <0.001). Sixty-five percent of the researchers indicated that, although the visual method was better than the controlled method, the control of keywords in the controlled method resulted in finding more proper keywords for the searches. Fifty-three percent of the participants in the research also mentioned that the use of the combination of the two methods produced better results. For users, it is more appropriate to use a natural, language-based method, such as the visual method, in the EBSCO Medline host than to use the controlled method, which requires users to use special keywords. The potential reason for their preference was that the visual method allowed them more freedom of action.

  10. Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface

    PubMed Central

    Tate, A Rosemary; Beloff, Natalia; Al-Radwan, Balques; Wickson, Joss; Puri, Shivani; Williams, Timothy; Van Staa, Tjeerd; Bleach, Adrian

    2014-01-01

    Objective UK primary care databases, which contain diagnostic, demographic and prescribing information for millions of patients geographically representative of the UK, represent a significant resource for health services and clinical research. They can be used to identify patients with a specified disease or condition (phenotyping) and to investigate patterns of diagnosis and symptoms. Currently, extracting such information manually is time-consuming and requires considerable expertise. In order to exploit more fully the potential of these large and complex databases, our interdisciplinary team developed generic methods allowing access to different types of user. Materials and methods Using the Clinical Practice Research Datalink database, we have developed an online user-focused system (TrialViz), which enables users interactively to select suitable medical general practices based on two criteria: suitability of the patient base for the intended study (phenotyping) and measures of data quality. Results An end-to-end system, underpinned by an innovative search algorithm, allows the user to extract information in near real-time via an intuitive query interface and to explore this information using interactive visualization tools. A usability evaluation of this system produced positive results. Discussion We present the challenges and results in the development of TrialViz and our plans for its extension for wider applications of clinical research. Conclusions Our fast search algorithms and simple query algorithms represent a significant advance for users of clinical research databases. PMID:24272162

  11. Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface.

    PubMed

    Tate, A Rosemary; Beloff, Natalia; Al-Radwan, Balques; Wickson, Joss; Puri, Shivani; Williams, Timothy; Van Staa, Tjeerd; Bleach, Adrian

    2014-01-01

    UK primary care databases, which contain diagnostic, demographic and prescribing information for millions of patients geographically representative of the UK, represent a significant resource for health services and clinical research. They can be used to identify patients with a specified disease or condition (phenotyping) and to investigate patterns of diagnosis and symptoms. Currently, extracting such information manually is time-consuming and requires considerable expertise. In order to exploit more fully the potential of these large and complex databases, our interdisciplinary team developed generic methods allowing access to different types of user. Using the Clinical Practice Research Datalink database, we have developed an online user-focused system (TrialViz), which enables users interactively to select suitable medical general practices based on two criteria: suitability of the patient base for the intended study (phenotyping) and measures of data quality. An end-to-end system, underpinned by an innovative search algorithm, allows the user to extract information in near real-time via an intuitive query interface and to explore this information using interactive visualization tools. A usability evaluation of this system produced positive results. We present the challenges and results in the development of TrialViz and our plans for its extension for wider applications of clinical research. Our fast search algorithms and simple query algorithms represent a significant advance for users of clinical research databases.

  12. Savvy Searching.

    ERIC Educational Resources Information Center

    Jacso, Peter

    2002-01-01

    Explains desktop metasearch engines, which search the databases of several search engines simultaneously. Reviews two particular versions, the Copernic 2001 Pro and the BullsEye Pro 3, comparing costs, subject categories, display capabilities, and layout for presenting results. (LRW)

  13. Racial differences in the association between preoperative serum cholesterol and prostate cancer recurrence: results from the SEARCH database

    PubMed Central

    Allott, Emma H.; Howard, Lauren E.; Aronson, William J.; Terris, Martha K.; Kane, Christopher J.; Amling, Christopher L.; Cooperberg, Matthew R.; Freedland, Stephen J.

    2016-01-01

    Background Black men are disproportionately affected by both cardiovascular disease and prostate cancer. Epidemiologic evidence linking dyslipidemia, an established cardiovascular risk factor, and prostate cancer progression is mixed. As existing studies were conducted in predominantly non-black populations, research in black men is lacking. Methods We identified 628 black and 1,020 non-black men who underwent radical prostatectomy and never used statins before surgery in the Shared Equal Access Regional Cancer Hospital (SEARCH) database. Median follow up was 2.9 years. The impact of preoperative hypercholesterolemia on risk of biochemical recurrence was examined using multivariable, race-stratified proportional hazards. In secondary analysis, we examined associations with low-density lipoprotein (LDL), high-density lipoprotein (HDL) and triglycerides, overall and among men with dyslipidemia. Results High cholesterol was associated with increased risk of recurrence in black (HRper10mg/dl 1.06; 95%CI 1.02–1.11) but not non-black men (HRper10mg/dl 0.99; 95%CI 0.95–1.03; p-interaction=0.011). Elevated triglycerides were associated with increased risk in both black and non-black men (HRper10mg/dl 1.02; 95%CI 1.00–1.03 and 1.02; 95%CI 1.00–1.02, respectively; p-interaction=0.458). There were no significant associations between LDL or HDL and recurrence risk in either race. Associations with cholesterol, LDL and triglycerides were similar among men with dyslipidemia, but low HDL was associated with increased risk of recurrence in black, but not non-black men with dyslipidemia (p-interaction=0.047). Conclusion Elevated cholesterol was a risk factor for recurrence in black but not non-black men, whereas high triglycerides were associated with increased risk regardless of race. Impact Significantly contrasting associations by race may provide insight into prostate cancer racial disparities. PMID:26809276

  14. Similarity searching in databases of flexible 3D structures using autocorrelation vectors derived from smoothed bounded distance matrices.

    PubMed

    Rhodes, Nicholas; Clark, David E; Willett, Peter

    2006-01-01

    This paper presents an exploratory study of a novel method for flexible 3-D similarity searching based on autocorrelation vectors and smoothed bounded distance matrices. Although the new approach is unable to outperform an existing 2-D similarity searching in terms of enrichment factors, it is able to retrieve different compounds at a given percentage of the hit-list and so may be a useful adjunct to other similarity searching methods.

  15. CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

    PubMed

    Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C

    2010-12-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

  16. CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.

    PubMed

    Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L

    2010-04-06

    Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models. This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA). A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT) abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD) abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72) times using the optimized SIMT algorithm and up to 1.77 (1.66) times using the partitioned vectorized algorithm, with a performance of up to 17 (30) billion cells update per second (GCUPS) on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295) graphics card. CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  17. Quantum

    NASA Astrophysics Data System (ADS)

    Elbaz, Edgard

    This book gives a new insight into the interpretation of quantum mechanics (stochastic, integral paths, decoherence), a completely new treatment of angular momentum (graphical spin algebra) and an introduction to Fermion fields (Dirac equation) and Boson fields (e.m. and Higgs) as well as an introduction to QED (quantum electrodynamics), supersymmetry and quantum cosmology.

  18. Familial searching: a specialist forensic DNA profiling service utilising the National DNA Database to identify unknown offenders via their relatives--the UK experience.

    PubMed

    Maguire, C N; McCallum, L A; Storey, C; Whitaker, J P

    2014-01-01

    The National DNA Database (NDNAD) of England and Wales was established on April 10th 1995. The NDNAD is governed by a variety of legislative instruments that mean that DNA samples can be taken if an individual is arrested and detained in a police station. The biological samples and the DNA profiles derived from them can be used for purposes related to the prevention and detection of crime, the investigation of an offence and for the conduct of a prosecution. Following the South East Asian Tsunami of December 2004, the legislation was amended to allow the use of the NDNAD to assist in the identification of a deceased person or of a body part where death has occurred from natural causes or from a natural disaster. The UK NDNAD now contains the DNA profiles of approximately 6 million individuals representing 9.6% of the UK population. As the science of DNA profiling advanced, the National DNA Database provided a potential resource for increased intelligence beyond the direct matching for which it was originally created. The familial searching service offered to the police by several UK forensic science providers exploits the size and geographic coverage of the NDNAD and the fact that close relatives of an offender may share a significant proportion of that offender's DNA profile and will often reside in close geographic proximity to him or her. Between 2002 and 2011 Forensic Science Service Ltd. (FSS) provided familial search services to support 188 police investigations, 70 of which are still active cases. This technique, which may be used in serious crime cases or in 'cold case' reviews when there are few or no investigative leads, has led to the identification of 41 perpetrators or suspects. In this paper we discuss the processes, utility, and governance of the familial search service in which the NDNAD is searched for close genetic relatives of an offender who has left DNA evidence at a crime scene, but whose DNA profile is not represented within the NDNAD. We

  19. Searching the Literatura Latino Americana e do Caribe em Ciências da Saúde (LILACS) database improves systematic reviews.

    PubMed

    Clark, Otavio Augusto Camara; Castro, Aldemar Araujo

    2002-02-01

    An unbiased systematic review (SR) should analyse as many articles as possible in order to provide the best evidence available. However, many SR use only databases with high English-language content as sources for articles. Literatura Latino Americana e do Caribe em Ciências da Saúde (LILACS) indexes 670 journals from the Latin American and Caribbean health literature but is seldom used in these SR. Our objective is to evaluate if LILACS should be used as a routine source of articles for SR. First we identified SR published in 1997 in five medical journals with a high impact factor. Then we searched LILACS for articles that could match the inclusion criteria of these SR. We also checked if the authors had already identified these articles located in LILACS. In all, 64 SR were identified. Two had already searched LILACS and were excluded. In 39 of 62 (63%) SR a LILACS search identified articles that matched the inclusion criteria. In 5 (8%) our search was inconclusive and in 18 (29%) no articles were found in LILACS. Therefore, in 71% (44/72) of cases, a LILACS search could have been useful to the authors. This proportion remains the same if we consider only the 37 SR that performed a meta-analysis. In only one case had the article identified in LILACS already been located elsewhere by the authors' strategy. LILACS is an under-explored and unique source of articles whose use can improve the quality of systematic reviews. This database should be used as a routine source to identify studies for systematic reviews.

  20. Reduction in database search space by utilization of amino acid composition information from electron transfer dissociation and higher-energy collisional dissociation mass spectra.

    PubMed

    Hansen, Thomas A; Kryuchkov, Fedor; Kjeldsen, Frank

    2012-08-07

    With high-mass accuracy and consecutively obtained electron transfer dissociation (ETD) and higher-energy collisional dissociation (HCD) tandem mass spectrometry (MS/MS), reliable (≥97%) and sensitive fragment ions have been extracted for identification of specific amino acid residues in peptide sequences. The analytical benefit of these specific amino acid composition (AAC) ions is to restrict the database search space and provide identification of peptides with higher confidence and reduced false negative rates. The 6706 uniquely identified peptide sequences determined with a conservative Mascot score of >30 were used to characterize the AAC ions. The loss of amino acid side chains (small neutral losses, SNLs) from the charge reduced peptide radical cations was studied using ETD. Complementary AAC information from HCD spectra was provided by immonium ions. From the ETD/HCD mass spectra, 5162 and 6720 reliable SNLs and immonium ions were successfully extracted, respectively. Automated application of the AAC information during database searching resulted in an average 3.5-fold higher confidence level of peptide identification. In addition, 4% and 28% more peptides were identified above the significance level in a standard and extended search space, respectively.

  1. Validity and Reliability of a Systematic Database Search Strategy to Identify Publications Resulting From Pharmacy Residency Research Projects.

    PubMed

    Kwak, Namhee; Swan, Joshua T; Thompson-Moore, Nathaniel; Liebl, Michael G

    2016-08-01

    This study aims to develop a systematic search strategy and test its validity and reliability in terms of identifying projects published in peer-reviewed journals as reported by residency graduates through an online survey. This study was a prospective blind comparison to a reference standard. Pharmacy residency projects conducted at the study institution between 2001 and 2012 were included. A step-wise, systematic procedure containing up to 8 search strategies in PubMed and EMBASE for each project was created using the names of authors and abstract keywords. In order to further maximize sensitivity, complex phrases with multiple variations were truncated to the root word. Validity was assessed by obtaining information on publications from an online survey deployed to residency graduates. The search strategy identified 13 publications (93% sensitivity, 100% specificity, and 99% accuracy). Both methods identified a similar proportion achieving publication (19.7% search strategy vs 21.2% survey, P = 1.00). Reliability of the search strategy was affirmed by the perfect agreement between 2 investigators (k = 1.00). This systematic search strategy demonstrated a high sensitivity, specificity, and accuracy for identifying publications resulting from pharmacy residency projects using information available in residency conference abstracts. © The Author(s) 2015.

  2. Quantum corrections in modern gauge theories of fundamental interactions and the search for new physics

    SciTech Connect

    Zucchini, R.

    1988-01-01

    We show that the analysis of the quantum effects in gauge theories yields several constraints which may be used to test their internal consistency and physical viability. We have studied, in particular, the Higgs sector of the minimal standard model and tested the universality of the weak interactions and the conserved-vector-current hypothesis. Finally, we have analyzed modular invariance in the closed bosonic string.

  3. Comparing the Precision of Information Retrieval of MeSH-Controlled Vocabulary Search Method and a Visual Method in the Medline Medical Database

    PubMed Central

    Hariri, Nadjla; Ravandi, Somayyeh Nadi

    2014-01-01

    Background: Medline is one of the most important databases in the biomedical field. One of the most important hosts for Medline is Elton B. Stephens CO. (EBSCO), which has presented different search methods that can be used based on the needs of the users. Visual search and MeSH-controlled search methods are among the most common methods. The goal of this research was to compare the precision of the retrieved sources in the EBSCO Medline base using MeSH-controlled and visual search methods. Methods: This research was a semi-empirical study. By holding training workshops, 70 students of higher education in different educational departments of Kashan University of Medical Sciences were taught MeSH-Controlled and visual search methods in 2012. Then, the precision of 300 searches made by these students was calculated based on Best Precision, Useful Precision, and Objective Precision formulas and analyzed in SPSS software using the independent sample T Test, and three precisions obtained with the three precision formulas were studied for the two search methods. Results: The mean precision of the visual method was greater than that of the MeSH-Controlled search for all three types of precision, i.e. Best Precision, Useful Precision, and Objective Precision, and their mean precisions were significantly different (P <0.001). Sixty-five percent of the researchers indicated that, although the visual method was better than the controlled method, the control of keywords in the controlled method resulted in finding more proper keywords for the searches. Fifty-three percent of the participants in the research also mentioned that the use of the combination of the two methods produced better results. Conclusion: For users, it is more appropriate to use a natural, language-based method, such as the visual method, in the EBSCO Medline host than to use the controlled method, which requires users to use special keywords. The potential reason for their preference was that the visual

  4. Visualization Tools and Techniques for Search and Validation of Large Earth Science Spatial-Temporal Metadata Databases

    NASA Astrophysics Data System (ADS)

    Baskin, W. E.; Herbert, A.; Kusterer, J.

    2014-12-01

    Spatial-temporal metadata databases are critical components of interactive data discovery services for ordering Earth Science datasets. The development staff at the Atmospheric Science Data Center (ASDC) works closely with satellite Earth Science mission teams such as CERES, CALIPSO, TES, MOPITT, and CATS to create and maintain metadata databases that are tailored to the data discovery needs of the Earth Science community. This presentation focuses on the visualization tools and techniques used by the ASDC software development team for data discovery and validation/optimization of spatial-temporal objects in large multi-mission spatial-temporal metadata databases. The following topics will be addressed: Optimizing the level of detail of spatial temporal metadata to provide interactive spatial query performance over a multi-year Earth Science mission Generating appropriately scaled sensor footprint gridded (raster) metadata from Level1 and Level2 Satellite and Aircraft time-series data granules Performance comparison of raster vs vector spatial granule footprint mask queries in large metadata database and a description of the visualization tools used to assist with this analysis

  5. Identifying Gel-Separated Proteins Using In-Gel Digestion, Mass Spectrometry, and Database Searching: Consider the Chemistry

    ERIC Educational Resources Information Center

    Albright, Jessica C.; Dassenko, David J.; Mohamed, Essa A.; Beussman, Douglas J.

    2009-01-01

    Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry is an important bioanalytical technique in drug discovery, proteomics, and research at the biology-chemistry interface. This is an especially powerful tool when combined with gel separation of proteins and database mining using the mass spectral data. Currently, few hands-on…

  6. Identifying Gel-Separated Proteins Using In-Gel Digestion, Mass Spectrometry, and Database Searching: Consider the Chemistry

    ERIC Educational Resources Information Center

    Albright, Jessica C.; Dassenko, David J.; Mohamed, Essa A.; Beussman, Douglas J.

    2009-01-01

    Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry is an important bioanalytical technique in drug discovery, proteomics, and research at the biology-chemistry interface. This is an especially powerful tool when combined with gel separation of proteins and database mining using the mass spectral data. Currently, few hands-on…

  7. Effect of cleavage enzyme, search algorithm and decoy database on mass spectrometric identification of wheat gluten proteins

    USDA-ARS?s Scientific Manuscript database

    Tandem mass spectrometry (MS/MS) is routinely used to identify proteins by comparing peptide spectra to those generated in silico from protein sequence databases. Wheat storage proteins (gliadins and glutenins) are difficult to distinguish by MS/MS as they have few cleavable tryptic sites, often res...

  8. The Opera del Vocabolario Italiano Database: Full-Text Searching Early Italian Vernacular Sources on the Web.

    ERIC Educational Resources Information Center

    DuPont, Christian

    2001-01-01

    Introduces and describes the functions of the Opera del Vocabolario Italiano (OVI) database, a powerful Web-based, full-text, searchable electronic archive that contains early Italian vernacular texts whose composition may be dated prior to 1375. Examples are drawn from scholars in various disciplines who have employed the OVI in support of their…

  9. The Opera del Vocabolario Italiano Database: Full-Text Searching Early Italian Vernacular Sources on the Web.

    ERIC Educational Resources Information Center

    DuPont, Christian

    2001-01-01

    Introduces and describes the functions of the Opera del Vocabolario Italiano (OVI) database, a powerful Web-based, full-text, searchable electronic archive that contains early Italian vernacular texts whose composition may be dated prior to 1375. Examples are drawn from scholars in various disciplines who have employed the OVI in support of their…

  10. Ten Most Searched Databases by a Business Generalist--Part 1 or A Day in the Life of....

    ERIC Educational Resources Information Center

    Meredith, Meri

    1986-01-01

    Describes databases frequently used in Business Information Center, Cummins Engine Company (Columbus, Indiana): Dun and Bradstreet Business Information Report System, Newsearch, Dun and Bradstreet Market Identifiers, Trade and Industry Index, PTS PROMT, Bureau of Labor Statistics files, ABI/INFORM, Magazine Index, NEXIS, Dow Jones News/Retrieval.…

  11. The use of quantum molecular calculations to guide a genetic algorithm: a way to search for new chemistry.

    PubMed

    Durrant, Marcus C

    2007-01-01

    The process of gene-based molecular evolution has been simulated in silico by using massively parallel density functional theory quantum calculations, coupled with a genetic algorithm, to test for fitness with respect to a target chemical reaction in populations of genetically encoded molecules. The goal of this study was the identification of transition-metal complexes capable of mediating a known reaction, namely the cleavage of N(2) to give the metal nitride. Each complex within the search space was uniquely specified by a nanogene consisting of an eight-digit number. Propagation of an individual nanogene into successive generations was determined by the fitness of its phenotypic molecule to perform the target reaction and new generations were created by recombination and mutation of surviving nanogenes. In its simplest implementation, the quantum-directed genetic algorithm (QDGA) quickly located a local minimum on the evolutionary fitness hypersurface, but proved incapable of progressing towards the global minimum. A strategy for progressing beyond local minima consistent with the Darwinian paradigm by the use of environmental variations coupled with mass extinctions was therefore developed. This allowed for the identification of nitriding complexes that are very closely related to known examples from the chemical literature. Examples of mutations that appear to be beneficial at the genetic level but prove to be harmful at the phenotypic level are described. As well as revealing fundamental aspects of molecular evolution, QDGA appears to be a powerful tool for the identification of lead compounds capable of carrying out a target chemical reaction.

  12. The Search for Universal Constants and the Birth of Quantum Mechanics

    NASA Astrophysics Data System (ADS)

    Robotti, Nadia; Badino, Massimiliano

    The origin of quantum theory and Max Planck's theoretical work are without doubt two of the most frequently quoted episodes in the history of quantum physics, for the obvious reason that they represented the first steps in its formulation. Paradoxically however there are relatively few specific studies of Planck and those differ on a range of questions. In our opinion this is due to the extremely synthetic nature of some of Planck's papers, and especially the "fundamentals" of October and December 1900. Faced with such brevity, a number of historians of science and philosophers have preferred to give a comprehensive analysis of the landmarks in Planck's work, often resorting to a more or less retrospective reconstruction process rather than attempting to build an all-embracing vision of Planck's work as a whole. In this paper we have therefore attempted to rebuild Planck's steps from 1899 to 1900. An analysis of this type shows that Planck's work has a profound internal unity throughout the entire period leading up to the discovery of the "quantum of energy". In our opinion a key to interpreting the mutual relationships between the various parts and stages of the theory in an intelligible manner is provided by Planck's interest in universal constants. This interest was grounded in two factors: 1) universal constants gave the entire theory a precise physical meaning, 2) they could be used to build a universal system of units of measurement. In particular we show that various pairs of constants are a clear feature of Planck's treatment of the blackbody problem throughout the period in question and that for Planck the appearance of these constants in the distribution law represented a fundamental criteria. So much so that it inevitably played a key role in what has been defined as the crucial moment of the entire process - the decision to use a probabilistic definition of entropy.

  13. Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling.

    PubMed

    Choi, Hyungwon; Ghosh, Debashis; Nesvizhskii, Alexey I

    2008-01-01

    Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.

  14. Direct identification of human cellular microRNAs by nanoflow liquid chromatography-high-resolution tandem mass spectrometry and database searching.

    PubMed

    Nakayama, Hiroshi; Yamauchi, Yoshio; Taoka, Masato; Isobe, Toshiaki

    2015-03-03

    MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene networks and participate in many physiological and pathological pathways. To date, miRNAs have been characterized mostly by genetic technologies, which have the advantages of being very sensitive and using high-throughput instrumentation; however, these techniques cannot identify most post-transcriptional modifications of miRNAs that would affect their functions. Herein, we report an analytical system for the direct identification of miRNAs that incorporates nanoflow liquid chromatography-high-resolution tandem mass spectrometry and RNA-sequence database searching. By introducing a spray-assisting device that stabilizes negative nanoelectrospray ionization of RNAs and by searching an miRNA sequence database using the obtained tandem mass spectrometry data for the RNA mixture, we successfully identified femtomole quantities of human cellular miRNAs and their 3'-terminal variants. This is the first report of a fully automated, and thus objective, tandem mass spectrometry-based analytical system that can be used to identify miRNAs.

  15. Proteome analysis of Sorangium cellulosum employing 2D-HPLC-MS/MS and improved database searching strategies for CID and ETD fragment spectra.

    PubMed

    Leinenbach, Andreas; Hartmer, Ralf; Lubeck, Markus; Kneissl, Benny; Elnakady, Yasser A; Baessmann, Carsten; Müller, Rolf; Huber, Christian G

    2009-09-01

    Shotgun proteome analysis of the myxobacterial model strain for secondary metabolite biosynthesis Sorangium cellulosum was performed employing off-line two-dimensional high-pH reversed-phase HPLC x low-pH ion-pair reversed-phase HPLC and dual tandem mass spectrometry with collision-induced dissociation (CID) and electron transfer dissociation (ETD) as complementary fragmentation techniques. Peptide identification using database searching was optimized for ETD fragment spectra to obtain the maximum number of identifications at equivalent false discovery rates (1.0%) in the evaluation of both fragmentation techniques. In the database search of the CID MS/MS data, the mass tolerance was set to the well-established 0.3 Da window, whereas for ETD data, it was widened to 1.1 Da to account for hydrogen-rearrangement in the radical-intermediate of the peptide precursor ion. To achieve a false discovery rate comparable to the CID results, we increased the significance threshold for peptide identification to 0.001 for the ETD data. The ETD based analysis yielded about 74% of all peptides and about 78% of all proteins compared to the CID-method. In the combined data set, 952 proteins of S. cellulosum were confidently identified by at least two peptides per protein, facilitating the study of the function of regulatory proteins in the social myxobacteria and their role in secondary metabolism.

  16. Projective loop quantum gravity. II. Searching for semi-classical states

    NASA Astrophysics Data System (ADS)

    Lanéry, Suzanne; Thiemann, Thomas

    2017-05-01

    In the first paper of this series, an extension of the Ashtekar-Lewandowski state space of loop quantum gravity was set up with the help of a projective formalism introduced by Kijowski. The motivation for this work was to achieve a more balanced treatment of the position and momentum variables (also known as holonomies and fluxes). While this is the first step toward the construction of states semi-classical with respect to a full set of observables, one uncovers a deeper issue, which we analyse in the present article in the case of real-valued holonomies. Specifically, we show that, in this case, there does not exist any state on the holonomy-flux algebra in which the variances of the holonomy and flux observables would all be finite, let alone small. It is important to note that this obstruction cannot be bypassed by further enlarging the quantum state space, for it arises from the structure of the algebra itself. A way out would be to suitably restrict the algebra of observables: we take the first step in this direction in a companion paper.

  17. International patent applications for non-injectable naloxone for opioid overdose reversal: Exploratory search and retrieve analysis of the PatentScope database.

    PubMed

    McDonald, Rebecca; Danielsson Glende, Øyvind; Dale, Ola; Strang, John

    2017-06-08

    Non-injectable naloxone formulations are being developed for opioid overdose reversal, but only limited data have been published in the peer-reviewed domain. Through examination of a hitherto-unsearched database, we expand public knowledge of non-injectable formulations, tracing their development and novelty, with the aim to describe and compare their pharmacokinetic properties. (i) The PatentScope database of the World Intellectual Property Organization was searched for relevant English-language patent applications; (ii) Pharmacokinetic data were extracted, collated and analysed; (iii) PubMed was searched using Boolean search query '(nasal OR intranasal OR nose OR buccal OR sublingual) AND naloxone AND pharmacokinetics'. Five hundred and twenty-two PatentScope and 56 PubMed records were identified: three published international patent applications and five peer-reviewed papers were eligible. Pharmacokinetic data were available for intranasal, sublingual, and reference routes. Highly concentrated formulations (10-40 mg mL(-1) ) had been developed and tested. Sublingual bioavailability was very low (1%; relative to intravenous). Non-concentrated intranasal spray (1 mg mL(-1) ; 1 mL per nostril) had low bioavailability (11%). Concentrated intranasal formulations (≥10 mg mL(-1) ) had bioavailability of 21-42% (relative to intravenous) and 26-57% (relative to intramuscular), with peak concentrations (dose-adjusted Cmax  = 0.8-1.7 ng mL(-1) ) reached in 19-30 min (tmax ). Exploratory analysis identified intranasal bioavailability as associated positively with dose and negatively with volume. We find consistent direction of development of intranasal sprays to high-concentration, low-volume formulations with bioavailability in the 20-60% range. These have potential to deliver a therapeutic dose in 0.1 mL volume. [McDonald R, Danielsson Glende Ø, Dale O, Strang J. International patent applications for non-injectable naloxone for opioid overdose reversal

  18. More Publications about Databases.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1983-01-01

    Reviews recent publications in online database literature including three newsletters ("Database Update,""Database Alert," and "Information Hotline"), a directory ("Guide to Online Databases"), and a textbook ("Online Reference and Information Retrieval" by Roger C. Palmer). The new "Guide to Searching ONTAP ABI/INFORM" is noted. (EJS)

  19. Construction of an Indonesian herbal constituents database and its use in Random Forest modelling in a search for inhibitors of aldose reductase.

    PubMed

    Naeem, Sadaf; Hylands, Peter; Barlow, David

    2012-02-01

    Data on phytochemical constituents of plants commonly used in traditional Indonesian medicine have been compiled as a computer database. This database (the Indonesian Herbal constituents database, IHD) currently contains details on ∼1,000 compounds found in 33 different plants. For each entry, the IHD gives details of chemical structure, trivial and systematic name, CAS registry number, pharmacology (where known), toxicology (LD(50)), botanical species, the part(s) of the plant(s) where the compounds are found, typical dosage(s) and reference(s). A second database has been also been compiled for plant-derived compounds with known activity against the enzyme, aldose reductase (AR). This database (the aldose reductase inhibitors database, ARID) contains the same details as the IHD, and currently comprises information on 120 different AR inhibitors. Virtual screening of all compounds in the IHD has been performed using Random Forest (RF) modelling, in a search for novel leads active against AR-to provide for new forms of symptomatic relief in diabetic patients. For the RF modelling, a set of simple 2D chemical descriptors were employed to classify all compounds in the combined ARID and IHD databases as either active or inactive as AR inhibitors. The resulting RF models (which gave misclassification rates of 21%) were used to identify putative new AR inhibitors in the IHD, with such compounds being identified as those giving RF scores >0.5 (in each of the three different RF models developed). In vitro assays were subsequently performed for four of the compounds obtained as hits in this in silico screening, to determine their inhibitory activity against human recombinant AR. The two compounds having the highest RF scores (prunetin and ononin) were shown to have the highest activities experimentally (giving ∼58% and ∼52% inhibition at a concentration of 15μM, respectively), while the compounds with lowest RF scores (vanillic acid and cinnamic acid) showed the

  20. Local image descriptor-based searching framework of usable similar cases in a radiation treatment planning database for stereotactic body radiotherapy

    NASA Astrophysics Data System (ADS)

    Nonaka, Ayumi; Arimura, Hidetaka; Nakamura, Katsumasa; Shioyama, Yoshiyuki; Soufi, Mazen; Magome, Taiki; Honda, Hiroshi; Hirata, Hideki

    2014-03-01

    Radiation treatment planning (RTP) of the stereotactic body radiotherapy (SBRT) was more complex compared with conventional radiotherapy because of using a number of beam directions. We reported that similar planning cases could be helpful for determination of beam directions for treatment planners, who have less experiences of SBRT. The aim of this study was to develop a framework of searching for usable similar cases to an unplanned case in a RTP database based on a local image descriptor. This proposed framework consists of two steps searching and rearrangement. In the first step, the RTP database was searched for 10 cases most similar to object cases based on the shape similarity of two-dimensional lung region at the isocenter plane. In the second step, the 5 most similar cases were selected by using geometric features related to the location, size and shape of the planning target volume, lung and spinal cord. In the third step, the selected 5 cases were rearranged by use of the Euclidean distance of a local image descriptor, which is a similarity index based on the magnitudes and orientations of image gradients within a region of interest around an isocenter. It was assumed that the local image descriptor represents the information around lung tumors related to treatment planning. The cases, which were selected as cases most similar to test cases by the proposed method, were more resemble in terms of the tumor location than those selected by a conventional method. For evaluation of the proposed method, we applied a similar-cases-based beam arrangement method developed in the previous study to the similar cases selected by the proposed method based on a linear registration. The proposed method has the potential to suggest the superior beam-arrangements from the treatment point of view.